URL rewrite for file extension. Now issue with shorten url

Hi All,

There are some URLs which we use for social promotions. These URLs are in shorten form using bitly which are now published.
The actual URLs are having .html extension and may be query string.
Now I have removed file extension using some rewrites.
These Rules are

RewriteRule ^([^?]+)\.html$ $1 [NC,R=301, L]
RewriteCond %{REQUEST_FILENAME}.html -f
RewriteRule ^(.*) $1.html [L]

Now the issue is, URLs are working fine on website
for example
www.mysite.com/mypage?param1=a&paramb=2 redirects to proper page on webiste. But before adding rewrites to remove file extension, the urls ( www.mysite.com/mypage.html?param1=a&paramb=2 ) which are published in shorten form using bitly are not working anymore.

Could you please help me with this?

b12,

Your mod_rewrite code:

RewriteRule ^([^?]+)\.html$ $1 [NC,R=301,{space}L]

# Capture anything except ? ??? It cannot capture a ? 
#    because it is a reserved character denoting the separation of the URI from the query string.

# The No Case flag should NEVER be used in a RewriteRule
#    because it only attempts to match the %{REQUEST_URI} variable
#    so it will always match (with an .html extension).

# Assuming that you have a space before your Last flag, 
#    the syntax is invalid and should throw a 500 error.

RewriteCond %{REQUEST_FILENAME}.html -f

# Fine except the dot character should be escaped.

RewriteRule ^(.*) $1.html [L]

# I am loathe to use the EVERYTHING (or NOTHING) atom,
#    even here where you simply capture the {REQUEST_URI} but it should work.

The EVERYTHING (or NOTHING) atom drew my first automated rant:

[rant #1][indent]The use of “lazy regex,” specifically the :kaioken: EVERYTHING :kaioken: atom, (.*), and its close relatives, is the NUMBER ONE coding error of newbies BECAUSE it is “greedy.” Unless you provide an “exit” from your redirection, you will ALWAYS end up in a loop![/indent][/rant #1]

If bit.ly is forwarding to your URL, your syntactically correct mod_rewrite should work (assuming nothing else in your .htaccess to “undo” the above code). Please note that the {QUERY_STRING} is not being affected by your mod_rewrite so it will be passed through to {whatever}.html.

Regards,

DK

Thanks dklynn.

URLs are not working in the sense, they are giving 404 error

b12,

Test your website first: Is the URL you’ve given bit.ly working when entered directly (assuming that you’ve removed the space in your flag set)?

Tip: Change the second RewriteRule’s flag to include R=301 so you can verify that it is redirecting correctly. THEN remove it again.

If they’re working, then check the URLs you have registered with bit.ly to be sure that you have the correct URLs.

Regards,

DK

Hi,

There is no space after last flag.
Giving you more inputs. Here is the rewrite rules

RewriteRule ^/([^\?]+)\? /$1.html  [QSA,L]
RewriteRule ^([^?]+)\.html$ $1 [NC,R=301,L]
RewriteCond %{REQUEST_FILENAME}.html -f
RewriteRule ^(.*) $1.html [L]

Now issue is
3 URLs (Having file extension and query string) are shorten form using bitly published. Now with above rule, they are giving 404 error.

AND

2 URLs (without file extension and having query string) are shorten form using bitly published. Now with above rule, they working.

Tried adding R=301 for the line you suggested. Still getting 404 error.

b12,

What are you trying to do with your first RewriteRule’s regex? The second? The third is okay despite creating a duplicate Apache variable ($1 === %{REQUEST_URI}).

Please show the URI (no need to add the domain name) of the bit.ly redirections - YOUR URIs that bit.ly is redirecting to. The 404s are showing that you’re not getting those URIs redirected to an existing file. That is why your URIs are needed.

Regards,

DK

Hi dklynn,

Here is one of the URLs which is published using bit.ly
www.mysite.com/store.html?utm_source=a&utm_medium=a&utm_campaign=a

b12,

store.html should cause no problem with the one good RewriteRule you have (the third one, the one with the RewriteCond)… Is this one failing (404) for you?

You’ve still not explained what you believe the regex in the first two rules are doing for you, though. IMHO, the first is a real mess and the second is almost as bad. Please help me out so I can help you.

Regards,

DK

Hi,

First line is added to append query string as it is and to work with urls having no file extension.

And the rest three lines are to remove file extension.

b12,

The query string is not affected UNLESS you delete it with a trailing ? OR add a new query string. You’re doing neither so it’s superfluous (as well as nonsense … code-wise).

The second RewriteRule, the RewriteRule ^([^?]+).html$ $1 [NC,R=301,L], has nonsense in the regex and should not have a No Case flag (the {REQUEST_URI} IS case sensitive).

Aha! What you’re trying to create is LOOPY code: Remove the .html file extenstion but serve the .html file anyway. Several years ago, a member asked whether this was possible and my first response was NO! After contemplating my navel for a while, I came up with a way to do it without being “loopy” before I discovered the Apache variable which made the coding of the “non-loopy” code easier. Rather than repeat it all here, have a look at my tutorial (in the examples, look for Redirect TO New Format).

The problem that noobies do not consider is that mod_rewrite will loop back through the code until it fails to find a match/redirection. Normally, this iteration of the code affects people indiscriminately using (.*) but most people do not try to redirect to a new format (while they can only serve the old format). Have a think about this and you’ll understand what I’m babbling on about.

Regards,

DK

Hi Dk,

These rules are written i iirf.ini file. The website is hosted on windows and using IIS 6. So I have added these rules in iirf.ini file.

I have removed NC flag from the rule.

b12,

IIS v6? Oh, my! I know that M$ has tried to duplicate Apache’s mod_rewrite for years (several IIS versions) but I’ve not been interested in their trials and tribulations. Nevertheless, the regular expressions should comply with the current set of regular expression engines as well as the set of allowed characters in a URI (see https://www.ietf.org/rfc/rfc2396.txt - Uniform Resource Identifiers (URI): Generic Syntax by Tim Berners-Lee).

If you insist on creating loopy code, please review my last post for the location of my original “un-loopy” code as that may work on M$ if they didn’t copy the Apache {IS_SUBREQ} variable.

If you need anything further for IIS, someone else may be able to stop in here to help OR you should go to M$.com as they should be able to help.

Regards,

DK

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.