News:

The anti-spam plugins have stopped being effective. Registration is back to requiring approval. After registering, you must ALSO email me with your username, so that I can manually approve your account.

Main Menu

Cleaning up PHP $_GET URLs

Started by yny-u, December 09, 2007, 11:30:17 PM

Previous topic - Next topic

0 Members and 2 Guests are viewing this topic.

yny-u

I am building a site with PHP/MySQL, and I was wondering.. Is there a good way to get URLs like

/gallery/index.php?section=illustration&id=27

to appear as, and and be accessible from something like

/gallery/illustration/tree-27

where "tree-27" is the name of the picture and its id. (Having the id number in the URL along with the name prevents to images with identical names from causing problems..)

Desh

The way I know of is pretty easy, and I actually learned it here from Databits in this topic :D

Basically, you take your php script that reads your values and chop off the .php.  Then, using .htaccess, you force the webserver to run it as a php script.  After that, you make your script read its URL, breaking it up by the slashes, and you have your variables.

In actual code, for the .htaccess


<Files ScriptName>
  ForceType application/x-httpd-php
</Files>

(Where ScriptName is the name of the php script you cut off)

Then, to get your variables, in the ScriptName script


$request=str_replace($_SERVER['SCRIPT_NAME'],'',$_SERVER['REQUEST_URI']);
$variables=explode('/',$request);


That first line will get rid of the ScriptName, and leave only your variables.  Then, your variables are broken up by slashes into an array.

Keep in mind, the way your script gets is variables is now different.  In your original method, variables are named and part of the $_GET group.  In this method, they're numbered starting at 0 and part of the $variables array. 

If you name your variables right away, such as $id=$_GET['id'];, just replace the $_GET with the right $variables[ #].  If you use $_GET throughout your script, you'll have to go through and change them.

Be sure to avoid naming a folder the same as ScriptName.  If you do, when you access ScriptName/Variable/Variable, the server will look for Variable/Variable in the folder and ignore your script.

Xepher

#2
Another option is to use ModRewrite to silently redirect the requests within apache itself.

This goes inside a .htaccess file placed at the base of your website (/home/username/public_html/.htaccess)

RewriteRule ^/gallery/(.*)/(.*)-(.*) /gallery/index.php?section=$1&id=$3


You may have to tweak the exact paths (the path it sees depends on where you put the .htacvess file, and I'm typing this from memory) but that's generally how it works. The $1 and $2 bits in the second part are replaced by the first and second parenthetical matches in the first part. The ".*" is a regular expression term meaning match-any-character (.) any-number-of-times (*). You can fine tune that using more specific regular expressions... you may also need to tweak the first part to be  "^/gallery/(.*)/(.*)-(.*)(/?)" in order to catch cases where people (or their browsers) end the URL with a final slash (/).

The advantage to doing it this way is you don't have to tweak your script in anyway, so it's easy to use with pre-made scripts. The only downside is you have to figure out regular expressions to do it. :-) Hopefully I've done the hard bit for you though.

Databits

Or you can do a third option where you direct ALL of the requests for your site through a single script which internally outputs what it needs to after parsing the information out.
(\_/)    ~Relakuyae D'Selemae
(o.O)    
(")_(")  [Libre Office] [Chrome]

yny-u

Ahh, thank you all very much for your help. I shall mess around with this..

Thanks again.

Databits

Basically, it's called a content dispatcher. Which is incorrectly called at times, a model view controller (most claimed "MVC's" aren't really MVC's).

It's actually a rather useful method because it allows you to custom process everything, including images, videos, music, and the like. In addition, when combining this method with customizing your webserver config (either in the direct server config or the .htaccess override), you can do things like only running through your processing script if the real file doesn't actually exist/
(\_/)    ~Relakuyae D'Selemae
(o.O)    
(")_(")  [Libre Office] [Chrome]

yny-u

#6
OK... I have poked at this a bit, and I am having trouble getting it to work.. To simplify matters, I have started by trying to rewrite gallery/index.php?section=all to gallery/all

Below is a copy of the code I have in the .htaccess file in my web root directory:

Options +FollowSymlinks
RewriteEngine on
RewriteRule ^gallery/([a-zA-Z]+)(/)?$ gallery/index.php?section=$1
RewriteRule ^gallery/([a-zA-Z]+)/([0-9]+)(/)?$ gallery/index2.php?section=$1&id=$2


First rewrite rule:
The URLs http://yny-u.xepher.net/gallery/all and http://yny-u.xepher.net/gallery/all/ both forward... But using the second (with the trailing slash) breaks the CSS and images..
The second rule forwards, but breaks the CSS and images either way (with or without the trailing slash).

I am guessing that the problems are being caused by the rewrite adding faux directory levels that are messing up the relative file paths, because the nav links (which are relative) on the broken pages point to the wrong directory level, while the rewrite that works (gallery/all) points to the level that the the pages are actually on..

Does anyone know how to fix or get around this? (other than going through and changing all the relative links to to absolute ones, that is.. -_- )

Thanks for all the help.

Xepher

The only thing that comes to mind is making another rewrite rule for images/css files that points back to the right location. Something like

^gallery/(.*)/(.*)(\.css|\.jpg|\.png)     gallery/$2$3

That's just a quick "sketch" of code there... I'm sure it needs adjustment, but I hope it at least shows what I'm talking about. The key would just be figuring out how gallery organizes its files, and creating enough rules to match it, and basically just stripping the fake directories from the middle of the URL.

yny-u

Thank you all for your help (sorry it took me so long to reply). I think I have gotten everything working now. Posted below is most of my .htaccess file - hopefully it will be helpful to someone interested in doing something similar.


#turn on the rewrite engine so all this stuff actually works..
Options +FollowSymlinks
RewriteEngine on
RewriteBase /

#redirect http://www.yny-u.com/ to http://yny-u.com/
RewriteCond %{HTTP_HOST} !^yny-u\.com [NC]
RewriteRule ^(.*) http://yny-u.com/$1 [L,R=301]

#clean up gallery URLs 
#make gallery sections accessible via /gallery/section/
RewriteRule ^gallery/([a-zA-Z]+)(/)?$ gallery/index.php?section=$1
#make gallery pages accessible via /gallery/section/id/ or /gallery/id/
RewriteRule ^gallery/([0-9]+)(/)?$ gallery/index2.php?section=all&id=$1
RewriteRule ^gallery/([a-zA-Z]+)/([0-9]+)(/)?$ gallery/index2.php?section=$1&id=$2
#make gallery full-view pages accessible via /gallery/section/id/full-view/ or /gallery/id/full-view/
RewriteRule ^gallery/([a-zA-Z]+)/([0-9]+)/full-view(/)?$ gallery/full-view.php?section=$1&id=$2
RewriteRule ^gallery/([0-9]+)/full-view(/)?$ gallery/full-view.php?section=all&id=$1
#exception for links to image files so that relative image source paths will still work
RewriteRule ^gallery/(.*)/images/(.*)/(.*)(|\.jpg|\.png)$ gallery/images/$2/$3

#clean up top URLs - make top.php accessible via /top/
RewriteRule ^top(/)?$ top.php
#rewrite links like /top/something/something/else/ to /something/something/else/
RewriteRule ^top/(.*)$ $1

#custom 404 page
RewriteRule ^404(/)?$ 404.php
ErrorDocument 404 http://yny-u.com/404/

#prevent people from viewing contents of directories without an index.php/html/etc file
IndexIgnore */*


Miluette

Fah realz necrobumping this thread.

I'm wondering if this stuff can also apply to redoing the urls for my news scripts. Right now they look like this:

http://millennium.senshuu.com/mil/index.php?subaction=showcomments&id=1235775579&archive=&start_from=&ucat=1&

Yeah D: No idea what I'd redo it to but something much shorter and simpler. And, as usual, my brain is fried.

It's bad enough that the subdirectories also appear after I've used mod rewrite to create those subdomains, but it's because of how the news script is set up (I'm using one installation on three sites).
And wasn't it you who told me,
"The sun would always chase the day"?

yny-u

This can get complicated, but you probably could get something like:

http://milenium.senshuu.com/mil/1/1235775579/showcomments/
With:
RewriteRule ^mil/([0-9]+)/([0-9]+)/([0-9a-zA-Z)(/)?$ mil/index.php?ucat=$1&id=$2&subaction=$3

(That is just off the top of my head, so it probably has some mistakes...)

The $# things in the second half of the RewriteRule (after the first "$") refer to the thing matched by the regular expression subsection in the #th set of parenthesis counting from left to right. You can add more variables (like your start_from and archive variables) quite easily. The trick is to remain consistent about what order you put the variables in, otherwise you will end up with ucat getting id's value or the like.

Hope that helps.


Miluette

#11
Slow response time! That is very helpful. <3

Before I try that, I kind of wonder if I can do that in a way so that anywhere past this part in my messy URLs can be rewritten without being relative to a certain subdirectory:

Quote?subaction=showcomments&id=1235775579&archive=&start_from=&ucat=1&

But actually I think I know what to do relative to paths too. It's slightly confusing, because I'm using one copy of CuteNews in three different subdirectories, hehe.

EDIT: Oh gawd, I tried adding both your thing and my idea to my root .htaccess and I got a 500 error. I wonder if having so many rewrite rules already makes it worse?
And wasn't it you who told me,
"The sun would always chase the day"?

yny-u

#12
Heheh, well it certainly is easy to get 500 errors when messing about the your .htaccess file... But having a lot of rules does not inherently generate an error. As long as all your rules are correct and don't conflict with each other there should be no problem (obviously).

Anyway, if you structure your URLs in such a way that each piece of data can be distinguished by some sort of regexp, you can probably set it up so that the order of the URL doesn't matter, ex. http://milenium.senshuu.com/mil/1/1235775579/showcomments/ and http://milenium.senshuu.com/1235775579/showcomments/mil/1 could both be fine... It makes writing your rewrite rules a little more complicated, but it should be doable. Is that what you mean by not being relative to a certain subdirectory? Or you can choose to only rewrite certain pieces of data and just pass everything else, ex. http://millennium.senshuu.com/mil/1/1235775579/?subaction=showcomments&start_from=1234

Does that help? Sorry, I am not quite sure what your question was..

Miluette

Well, hmm, let's see if I can explain it... I'm bad at explaining what I mean sometimes lol.

All of my actual news files are in /news/
But they're being inserted on /lf/, /mil/, and /ai/
(Those are all from the root directory)

I want to do a rewrite rule that doesn't involve including or relying on /mil/ /lf/ or /ai/ in the news-generated URLs at all, if possible (but I don't think it is)

And I wanna be able to try that suggestion of yours without having to create multiple .htaccess files in different directories (before I just tried your suggestion in the root one, and it didn't work aaaa)

Actually I'm not sure which .htaccess file I should be editing where ()
And wasn't it you who told me,
"The sun would always chase the day"?

Databits

#14
Blend them.

Never, ever, split your variables up to look like a directory structure. Things like Google rank your pages lower the more it looks like they are nested in deep directories.

Generally, when you're doing something like this, it's generally nice to do a blend of the two methods. There is nothing wrong with having some get variables. Also, you could add in a url mapping that auto-assigns get variables on the back-end. That way you can have things like, say, "/page-1-1.html" translate to ?chapter=1&page=1 to your scripts. However, if you want to do something like sort by X as well, keep that as a GET param, there's no need to mask things like that as it's nothing more than needless and extra headaches.

Generally in something like a CMS (which is what I'm familiar with), you'd have some sort of URL mapping that translates particular special urls or patterns into the explicit variables you need. Say, for instance, you are using a CMS and therefore need to define a controller and view for viewing a comic page vs a news article. The comic page could be something like, say, the "viewIndex" method on the "ControllerComic" controller, where the news article page could be the "viewIndex" method on the "ControllerNews" controller:

class ControllerComic {
  public function viewIndex() {
  }
}

class ControllerNews {
  public function viewIndex() {
  }
}


Yet to define this to your system you pass something like, say a "view" variable

Now, what you could do is make something like "/comic-1-1.html" translate to "?view=comic.index&chapter=1&page=1" to your primary script, then loading/appending things to get the appropriate controllers and views:

if (preg_match(';/comic-([0-9]+)-([0-9]+)[.]html;', $_SERVER['REQUEST_URI'], $match)) {
  $_GET['controller'] = 'comic';
  $_GET['view'] = 'index';
  $_GET['chapter'] = $match[1];
  $_GET['page'] = $match[2];
}


Of course this is a VERY rough example, and I'd generally advise against explicitly setting GET/POST vars in this fashion. Instead it would be better to explicitly define a global custom environment variable for your application and read that in conjunction to the passed GET/POST vars.
(\_/)    ~Relakuyae D'Selemae
(o.O)    
(")_(")  [Libre Office] [Chrome]