Percent in URL stops redirect, please HELP!

Hello,

I have the following line in my .htaccess file:

RewriteRule ^categories/(.)\.html(.) /$1/$2 [R=301,L]

If there is a % symbol in the URL it does not redirect. For example:

this url:

http://test.tmetrix.com/categories/AC-DC-Electronic-Loads.html

redirects here, as expected:

http://test.tmetrix.com/AC-2F-DC-Electronic-Loads.html

But this url:

http://test.tmetrix.com/categories/AC-%2F-DC-Electronic-Loads.html

does NOT redirect!

Does anyone know how to redirect a URL that has a % symbol in it?

Thanks!

af,

As you are aware, encoded characters display as %{two character hexadecimal} in a URL. These are to be avoided, obviously, as they make for an UGLY URL! Then, one has to ask WHY you’d encode a / (%2f) in a URI!

Look at http://www.ietf.org/rfc/rfc2396.txt to see which characters are PERMITTED in a URI and which are RESERVED. It DOES make a difference!

Okay, after all that, it really SHOULD redirect as Apache decodes the character before processing with mod_rewrite. For instance, %20 (a space) is matched with [\ ] in regex.

Finally, PLEASE, for your own sake, learn regex! The :kaioken: EVERYTHING :kaioken: atom, (.*), will get you in more trouble than any other bit of regex there is! Having it in your regex TWICE can only get you in TWICE the trouble (actually, the second one will only match the null string as it CANNOT match anything in the %{QUERY_STRING} variable - you MUST use a RewriteCond for that!

Regards.

DK

dklynn,

As you are aware, encoded characters display as %{two character hexadecimal} in a URL. These are to be avoided, obviously, as they make for an UGLY URL! Then, one has to ask WHY you’d encode a / (%2f) in a URI!

I put product names in the URL for SEO reasons, and some product names have a “/” in them and when URL encoded they become a %2f.
Also, the %2f might be necessary for an ajax api that requires a URL to be sent within another URL.

so the link would be:
<a href=“/categories/product-name%2fwith-a-slash”>Product Name / With A Slash</a>

My php script checks the database for a product with that name (properly escaping the value of course) to show the appropriate page with product details.

Okay, after all that, it really SHOULD redirect as Apache decodes the character before processing with mod_rewrite. For instance, %20 (a space) is matched with [\ ] in regex.

If this is true, than there is a bug.
Like I said in my original post, simply putting a % in the URL stops the redirect from happening. There is no reason that it should not redirect.

RewriteRule ^categories/(.)\.html(.) /$1/$2 [R=301,L]
(.*) means anything, so ‘string’ and ‘str%ing’ should both redirect, but the second one does not!

Finally, PLEASE, for your own sake, learn regex! The EVERYTHING atom, (.*), will get you in more trouble than any other bit of regex there is! Having it in your regex TWICE can only get you in TWICE the trouble (actually, the second one will only match the null string as it CANNOT match anything in the %{QUERY_STRING} variable - you MUST use a RewriteCond for that!

The reason is that the second B[/B] does actually does allow URLs to redirect with a query string.

This url: website.com/category/test.html?name=frank
redirects to
website.com/test/?name=frank

Is there any reason not to do this?
I prefer this
RewriteRule ^categories/(.)\.html(.) /$1/$2 [R=301,L]
rather than using the ugly:
RewriteCond %{QUERY_STRING} ^(.)$ [NC]
RewriteRule ^categories/(.
)\.html /$1/%1 [R=301,L]

(by the way, I know about regexp, at least within php. it is htaccess that throws me for a loop.)

Thanks for the response, unfortunately it was not helpful in solving this problem.

So… anyone have an answer, or work-around that allows a URL with a percent symbol to redirect?

af,

It’s very likely that mod_rewrite’s regex engine does NOT consider encoded characters to be such UNLESS they’re within a range definition, therefore, I stand by my recommendation NOT to use (.*) in the first case. Use ([-a-zA-Z0-9_/]+) instead.

As for the second (.*), it will NEVER match a query string … BY DEFINITION! The RewriteRule ONLY examines the {REQUEST_URI} string. RewriteCond statements are used to access any other Apache variable you need including the {QUERY_STRING}. The simple reason that you see a query string on a redirected URI is that it’s simply not affected by a RewriteRule UNLESS a query string is used in the redirection. If it is, then any pre-existing query string is replaces UNLESS the QSA flag is used.

Let me know about the results of the first suggestion.

Regards,

DK

Maybe the hosting company set something up wrong, or the QSA flag is set somewhere. What I am saying is actually happening, it is fact, not just what I think.

When I access the REQUEST_URI with PHP, it is the entire uri that you see in the address bar (including the query string)
This does not match a post. It does match a get.

I tried getting rid of the (.*) and specifically matching the % symbol. It still doesn’t work.

So here is the question where an answer will solve my problem:
How do I make this:
website.com/something/test%20this
redirect to this
website.com/test%20this

if all else fails, I will probably add a php redirect instead…

af (and others),

Since you don’t believe me about the explanation I gave, please make a test or two:

(1) REMOVE the last (.*) and see that you get the same URI in the redirection (i.e., with the query string) and

(2) change (.*) to (.+) and see that there is NO redirection (because this specifies to mod_rewrite that it must have at least one character after your .html IN THE {REQUEST_URI} STRING to match and redirect).

WHEN both tests give the answers I’ve indicated, you should realize that what I’ve told you is correct and that your host hasn’t been playing games with the configuration.

If you need any further confirmation, view apache.org’s page on mod_rewrite as it is the gospel about mod_rewrite - it’ll tell you the same thing.

Regards,

DK