Loosing part of query string

I’m going to post this here, because I think it might have something to do with my rewrite rules, or some server setting, but I might be completely wrong…

I have the following .htaccess file:


RewriteEngine on
# activate this line if necessary and specify the subdirectory
# RewriteBase /test8/
# don't redirect existing files and directories
# redirect everything else to index.php?language=$1&page=$2[&$3]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([a-zA-Z0-9_]+)/([a-zA-Z0-9_]+)(/(.*))?$ index.php?language=$1&page=$2$4 [L]

So the first part would become the language, the second the page, and the rest would be appended as is to the query string.
This works just fine on the server that hosts my test site, and almost fine on the server that hosts the live site.

The first problem has to do with spaces in the query string.

http://www.mysite.com/en/documentation/&ccfilelistsubdir=0 test with space

This is handled fine on the test server: it calls index.php with query string

?language=en&page=documentation&ccfilelistsubdir=0 test with space

On the live server index.php receives this query string:

?language=en&page=documentation&ccfilelistsubdir=0

which of course results in a 404 page.

I managed to resolve that problem by using the PHP urlencode function, which creates this link:

http://www.mysite.com/en/documentation/&ccfilelistsubdir=0+test+with+space

and it works fine on the live server.

But when I want enter another subfolder (it’s a directory tree), the link becomes

http://www.mysite.com/en/documentation/&ccfilelistsubdir=0+test+with+space%2Fanother_folder

and I get this error on the live server (the test server still works fine without any encoding):

Not Found

The requested URL /en/documentation/&ccfilelistsubdir=0+test+with+space/another_folder was not found on this server.
Apache/2.2.9 (Debian) Server at xxx Port 80

I did some more tests, and I found that doing a urlencode on the entire subdir list (including the / ) causes the error. So I’ll try to find a way to encode only the folder names. But of course, the best solution (IMO) would be to have it work ‘as is’ on the live server too.

My questions are:

  1. Why doesn’t it work on both servers? What could be the cause of loosing everything after the first space?
  2. How can I get it to work on the live server as well?

guido,

Welcome to the Apache board!

Problem 1: The space is an ILLEGAL character in a URI (http://www.ietf.org/rfc/rfc2396.txt) so it can never be used. Your urlencode took care of that problem by replacing the spaces with +'s - what’s wrong with that?

RewriteEngine on
# activate this line if necessary and specify the subdirectory
# RewriteBase /test8/
# don't redirect existing files and directories
# redirect everything else to index.php?language=$1&page=$2[&[COLOR="Red"]$3[/COLOR]]
[indent]$3 includes the / whereas $4 does NOT.[/indent]RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([a-zA-Z0-9_]+)/([a-zA-Z0-9_]+)(/(.*))?$ index.php?language=$1&page=$2[COLOR="Red"]$4[/COLOR] [L]
[indent]It looks like you've already sorted this one out.[/indent]

Problem 2: http://www.mysite.com/en/documentation/&ccfilelistsubdir=0+test+with+space%2Fanother_folder shows the $3 version of the code above, NOT the $4 version. Use the $4 to be rid of the / as that is apparently confusing your server (MultiViews) which is causing it not to work. IMHO, you should use &$4 … but that’s me.

I trust this answers both questions. My question, though, is WHY it actually worked on your test server - it should NOT have done so!

Regards

DK

David, thanks for the welcome and for the answer :slight_smile:
Yes, I already resolved problem 2, I just forgot to edit the comment line.

It is no problem encoding the urls, it’s just that I thought it might not be necessary because it works ‘as is’ on my test server. Copying and pasting the test server url from my browser it seems to me the spaces have been replaced by %20, but I didn’t do that. Something in the server or PHP settings maybe?

I also managed to avoid the 404 pages when the &ccfilelistsubdir contains more than one folder (for example &ccfilelistsubdir=folder1/folder2). I have to split the path into its single folder names, encode each one and then put them back together again with /. So the trick is not to encode the / as well. No idea why (I guess if I want to know I’ll have to ask in the PHP forum), but it works :slight_smile:

By the way, I managed to write that .htaccess code after reading your article, and some (but not much) further googling.

Hi Guido!

I’m really baffled as to WHY it worked on your test server! Is that Apache on a WinDoze box? I guess I should try mine (despite the techno-treatise by Tim Berners-Lee) to see if that’s a problem (unanticipated “feature”) with WinDoze.

Yes, I’m pretty sure that the / is also a reserved character although I have to admit to a bit of surprise that a / in the query string would be problematic. See? Even an OLD DOG can learn something new!

Say, if I wasn’t clear enough in my tutorial, PLEASE let me know where so I can continue to improve the thing! Thanks!

Regards,

DK

Like David, I’m surprised this ever worked.
IMO the & in the URL might also throw things off, and anyway [QSA] seems a better solution to me that the solution you currently have.

i.e. change


RewriteRule ^([a-zA-Z0-9_]+)/([a-zA-Z0-9_]+)(/(.*))?$ index.php?language=$1&page=$2$4 [L]

to


RewriteRule ^([a-zA-Z0-9_]+)/([a-zA-Z0-9_]+) index.php?language=$1&page=$2 [L,QSA]

and then the URL from


http://www.mysite.com/en/documentation/&ccfilelistsubdir=0 test with space

to


http://www.mysite.com/en/documentation/[COLOR="Blue"][B]?[/B][/COLOR]ccfilelistsubdir=0 test with space

The [QSA] in the RewriteRule will make sure that the original query string will be appended to the rewritten URL.
Don’t get me wrong, your solution may also work, but with the & in that it looks weird (and feels a bit hacky) to me.

Also, what languages do you support? If it’s only a few I would explicitly code them instead of using a general regex. So, something like (en|nl) instead of ([a-zA-Z0-9_]+) . Makes the rule easier to read and easier to parse for Apache :slight_smile:

Apache

Well, I coded it as &
It’s just that it shows up as & in the browser.

That looks good. I’ll give it a try thanks :slight_smile:

Right now, five. But that might increase in the future. I don’t think that it would make that much of a difference to the parser, would it? I’ve seen much more complicated rules while studying this stuff.

No probably not. Maybe you could get it from 1ns to 0,95ns. :wink:

Another advantage is that if a language is provided in the URL that is not in the list in the RewriteRule apache will throw a 404 itself, without the need for the app to do that. Of course sometimes you want the app to manage that (templating, etc), but it’s food for thought nonetheless :slight_smile:

Yep, I thought about that. If the language doesn’t exist, it uses the default one

Make sure you use an external (301) redirect to redirect the browser to the correct URL. Otherwise you could end up with a whole bunch of URLs that don’t exist in the SEs.