Htaccess mod rewrite, match all except

hello again, after hours of searching and posting and googling for a possible answer i have still found nothing, so once again i rely on sitepoint’s help… to help me out.

here is a explanation, to explain the problem:

user inserts in url:
http://host/user_name
he is then redirected (linked) to:
http://host/users/user.php?u=user_name


^http://host/user_name$  http://host/users/user.php?u=user_name

while this gets to its original location:


^http://host/index\\.(html|php)$ http://host/index.php
^http://host/pages/page\\.(html|php)$ http://host/pages/page.php

basically, using mod rewrite i need EVERYTHING directly after the host, without a extension to be redirected to the users page
as its most likely a username in my website.

any hints?

Yuri,

Not much info there but, extensionless = username should be done with code like:

RewriteEngine on
RewriteRule ^/?([a-z]+)$ users/user.php?u=$1 [L]

This may cause a problem with requests like “users”, though!

Regards,

DK

many thanks for the super fast reply!!

extensionless means that the URI must not end in .html or .php or .htm

for the sake of keeping things tidy i keep all the webpages (real or not) with the .html extension.

after some more reading i have discovered that [^\./] should exclude the dot (doesen’t work yet), but i need the “.html” to be excluded and not just the “.”

the rule in plain english:
convert into “users/user.php?u=user_name [L,QSA]” if does not contain “.html”

(i will reply tomorrow, due to the fact that i live in spain)

Yuri,

“after some more reading i have discovered that [^\./] should exclude the dot” - it will but the correct way to say that is [^./] because the dot character is not a metacharacter inside a range definition.

the rule in plain english:
convert into “users/user.php?u=user_name [L,QSA]” if does not contain “.html”
- I know you don’t mean that (.jpg, .gif, .css, .js, etc) but it can be done with

# Best to include a file and directory exclusion, too
# RewriteCond %{REQUEST_URI} !-d
# RewriteCond %{REQUEST_URI} !-f
RewriteCond %{REQUEST_URI} !(\\.html)$
RewriteCond ^/?(.*)$ users/user.php?u=$1 [L]

Note, please, that my use of username as the value of u in the earlier post was incorrect - it’s been changed to $1. Regards,

DK

Many thanks for the big help!

after some fiddling with the code, and fixing “error 500”, i came up with the following code:


RewriteCond %{REQUEST_URI} !-d
RewriteCond %{REQUEST_URI} !-f
RewriteCond %{REQUEST_URI} !(\\/)$
RewriteCond %{REQUEST_URI} !(\\.html|\\.htm|\\.php|\\.css|\\.js|\\.jpg|\\.jpeg|\\.png|\\.gif|\\.flv|\\.swf)$

RewriteRule ^/?(.*)$ users/user.php?u=$1 [L,QSA]

which in english would be:
if is not dir
if is not file
if does not end in /
if does not end in (list of used file extensions)
then it must be a extension(less) name, hence a username, then php checks if he exists, if not, then throws a error 404 (or equivalent).

(i did realize that u=$1, but for the sake of explaining the problem i wrote it as user_name.
the part that i did not know was:
RewriteCond %{REQUEST_URI} !(\.html)$

does the RewriteCond only apply to the rewriterule’s below it?

“after some more reading i have discovered that [^\./] should exclude the dot” - it will but the correct way to say that is [^./] because the dot character is not a metacharacter inside a range definition.

i am 100% sure that i copied exactly [^\./] from another forums .htaccess guru’s post, i guess everyone makes mistakes, that get carried over…

the rule in plain english:
convert into “users/user.php?u=user_name [L,QSA]” if does not contain “.html” - I know you don’t mean that (.jpg, .gif, .css, .js, etc) but it can be done

exactly, unless apache has a way of detecting all the extensions, with one symbol (or name).
)

currently this works, inform me about any potential sinkholes and correct me if i have anything wrong.

Hi Yuri,

Sorry for the delay - I left town for two days.

Yes, your code is “perfect” - okay, you do NOT need to escape the / character and the -f should cover ALL extensions (AND test that the file exists) so the list really isn’t necessary. Moreover, my use of ^/? is so that it can be used for either version of Apache whereas ^/ is Apache 1.x and ^ is Apache 2.x (in DocumentRoot).

Yes, the !\.html$ RewriteCond is part of a block mod_rewrite statement and so it will only apply to the block it’s contained within.

Yeah, different flavors of “guru” all over the place - each with idiosynchracies (idiotsynchracies?). As for me, I loathe the inproper use of (.*) and <IfModule> and will almost always only use one .htaccess file (in the DocumentRoot).

Regards,

DK

-f should cover ALL extensions (AND test that the file exists) so the list really isn’t necessary.

when testing the -f did not remove the files and basically broke all the stylesheets and scripts that were attached to the webpage, while the list fixes this problem.

Moreover, my use of ^/? is so that it can be used for either version of Apache whereas ^/ is Apache 1.x and ^ is Apache 2.x (in DocumentRoot).

amazing, this is why i love forums.

edit:

Yes, the !\.html$ RewriteCond is part of a block mod_rewrite statement and so it will only apply to the block it’s contained within.

this sounds a lot less confusing:
RewriteCond and RewriteRules are AND’ed until terminated by the Last ([L]) flag
/edit

As for me, I loathe the inproper use of (.*)

improper? how can it be improperly used?

p.s. i don’t even know what <IfModule> does… (might as well click the link to your site in your signature)
p.p.s oh my, signs of jQueritis
edit: after reading your site, i came up with the following questions.

  1. what specifically will cause a mod_rewrite loop, and how do i detect a loop?
  2. why doesn’t your website tutorial section have comments?!? :slight_smile:
    and yes, i do notice that this is leaving the point of the topic, but starting a new topic for something this small, would be pointless??

oh, i almost forgot:
thanks again for all the help! :smiley:

Yuri,

Really?!? That’s unusual (and inexplicable unless I looked at EVERYTHING - but “don’t fix it if it ain’t broke”).

(.) is generally used by “noobies” because they don’t understand regex. They capture EVERYTHING (or nothing) then wonder why their specific case doesn’t work as expected. Using specific regex, i.e., specifying the characters the regex is supposed to match, is far superior and can act as a first line of defense against hack-attacks. How can it loop? If you use (.) to capture an extensionless filename and redirect to index.php?filename=$1, wouldn’t (.) also match index.php? Wouldn’t (.) also match {nothing}? If I want an extensionless filename, I’ll specify ([a-z]+) and ONLY match ONE OR MORE lowercase letters. It’s all in the “specificity.”

<IfModule {module_filename}> wraps a block of (mod_rewrite) code and, before a file is served, Apache must check whether the module is enabled before parsing the contained code. My view is that developers must include this to prevent causing 500 Errors but the wrapper must be removed by a webmaster to prevent wasted time (and CPU resources) to check the module before serving EVERY request. Small, but it adds up to a tremendous burden - especially on shared servers!

jQuery? Oh, yeah! That page violates one of my primary rules - it’s too freakin’ long! Rather than lead people through a half dozen pages, I thought it would be easier to put it all on the one page and facilitate getting where a visitor wants to go by presenting only the lead-in and “chapter titles.” Expandable (to allow a print all) or easily navigable was my strategy.

On Apache 1.x, the server gets “locked-up” and has to be rebooted by the host. On Apache 2.x, the desired result is either not obtained or the server takes an inordinate amount of time to serve a file. Okay, inordinate, by computer standards, is a blink of the eye so you may have to look into server logs to see a URI being redirected to itself several times (Apache 2.x’s default limit is 10).

The text is supposed to have explained (before or after) what is going on so I thought it unwise to repeat as the effect would have been to make the page much longer yet!

What new topic? NOTHING is small if you don’t understand something so ask away!

Finally, you’re very welcome!

Regards,

DK

and inexplicable unless I looked at EVERYTHING

there is not much to look at.
a updated apache server on xp.

regarding the -f
heres the full code of the testing environment.


RewriteEngine on

RewriteCond &#37;{REQUEST_URI} !-d
RewriteCond %{REQUEST_URI} !-f
#RewriteCond %{REQUEST_URI} !(\\/|\\.html|\\.htm|\\.php|\\.css|\\.js|\\.jpg|\\.jpeg|\\.png|\\.gif|\\.flv|\\.swf)$

RewriteRule ^/?(.*)$ web_docs/users/user.php?u=$1 [L,QSA]

i have a folder web_docs/users/
and 1 file in the folder, user.php

doesn’t work as i expected.
everything gets redirected to the user.php file, even the root index.php

for some bizarre reason it doesn’t detect the files that exist and exclude them as it should, could this have something to do with file indexing in windows xp? (i have it disabled to allow searching of .php files)
note: i’m testing on a apache server on windows xp. (i know i should get a linux, and eventually i will)

when i remove the comment (#) bit, then i specifically exclude any extensions, but as .htaccess is a very important file in websites i prefer to have it optimized, and having the -f work would be great.

the mission of the mod_rewrite is simple, redirect everything that is not a file, nor a directory on the server to the user.php file, and user.php will then decide whether the visitor gets a user page or a 404 page.

obviously i am going to stick with the working version, i’m simply confused on why it doesn’t work. (and i thought you might find this interesting)

(.*) is generally used to capture EVERYTHING (or nothing)

i should change the RewriteRule ^/?(.*)$ web_docs/users/user.php?u=$1 [L,QSA]

to RewriteRule ^/?(.+)$ web_docs/users/user.php?u=$1 [L,QSA]???

edit: hehehe

jQuery? Oh, yeah! That page violates one of my primary rules - it’s too freakin’ long! Rather than lead people through a half dozen pages

that text hide thing was fine, although it did break my ctrl+f. i meant the left side navigation, it’s unusable.

The text is supposed to have explained (before or after) what is going on so I thought it unwise to repeat as the effect would have been to make the page much longer yet!

i might be wrong, but i have always considered comments to be invaluable source of information, no one person can know everything and write it down, but comments make it a group activity, also makes the web feel more alive.
it does have its drawbacks, it needs monitoring, it needs anti-spam things, it needs filtering, closing, opening, writing etc… its more work for a already busy person.
i’m not saying you have to do it, i’m simply saying that I would have liked to see some.

so as a conlusion:

  1. why dont my -f work??!?
  2. should i change my (.*) to (.+)?
  3. jQuery is awesome but should not be overused, only a suggestion.
  4. dklynn’s organizational style reminds me of http://meiert.com/en/ but without the comments.
  5. i have a born disability at stopping a interesting conversation.

there is no 4. because it merged with 5.

Hi Yuri!

May I assume that your displayed .htaccess is in the DocumentRoot?

Question: Do you WANT to “hijack” all URIs to the web_docs/users/user.php script (which are not directories or files)? That is exactly what your use of the dreaded :kaioken: EVERYTHING :kaioken: atom, the (.*), does!

That’s EXACTLY what you told it to do!

for some bizarre reason it doesn’t detect the files that exist and exclude them as it should, could this have something to do with file indexing in windows xp? (i have it disabled to allow searching of .php files)
note: i’m testing on a apache server on windows xp. (i know i should get a linux, and eventually i will)
There are two possibilities:

  1. Disable search? If it’s in the file system, Apache should be able to find it - and it’s NOT a matter of OS (I’m on an XP box, too).

  2. A lot of people will use {REQUEST_FILENAME} (rather than {REQUEST_URI}) which is supposed to use the physical path/to/file and may actually be required by XP (I haven’t found this to be true but …). Try the {REQUEST_FILENAME} - and please report back whether that resolves this weird problem!

Optimization is one thing - not working is quite another. In my book, “if it ain’t broke, don’t fix it!”

Back to the beginning - redirect EVERYTHING.

(.) - To me, that a non-programmer’s (rubber) crutch which will fail to do as intended because it’s so all-encompassing! There are times when it IS appropriate but most people don’t know when those times are nor why it causes a loop. The -f should prevent looping in the above mod_rewrite code but most people who use (.) don’t even bother to prevent loops with a RewriteCond (or a redirection to another domain - away from the mod_rewrite code).

so as a conlusion:

  1. why dont my -f work??!?[INDENT]Should probably be {REQUEST_FILENAME} rather than {REQUEST_URI}.
    [/INDENT]2. should i change my (.*) to (.+)?[INDENT]Not really - almost as bad as it’s the same thing but requires at least one character
    [/INDENT]3. jQuery is awesome but should not be overused, only a suggestion.[INDENT]Always true - of anything[/INDENT]5. dklynn’s organizational style reminds me of http://meiert.com/en/ but without the comments.[INDENT]Is that a good thing?

Yes, your comments about others leaving comments are good - but my tutorial is really a compilation of Q&A from my first year as a Mentor and not intended to start arguements, flame wars, answer questions (outside SitePoint), etc.[/INDENT]6. i have a born disability at stopping a interesting conversation.[INDENT]I have an innate tendency to tell how the watch is built rather than what time it is! :lol:[/INDENT]Regards,

DK

Do you WANT to “hijack” all URIs to the web_docs/users/user.php script

hijack all URIs that do not exist as files on the server.

dreaded EVERYTHING atom, the (.*), does!

are you suggesting me to replace it with ([a-zA-Z_]+)?
but i need it to support everything, including the amazing gibberish symbols,
basically want it to include “a-zA-Z0-9_ -.” maybe even more symbols, including the ones that get url encoded…
and how would “a-zA-Z0-9_ -.” look in the mod_rewrite?
like this ([0-9a-zA-Z_\ -\.]+)?

everything gets redirected to the user.php file, even the root index.php
That’s EXACTLY what you told it to do!

but i specifically said, no existing files! (in the rewrite condition)

Disable search?

this sounds stupid of me, but i have no idea on how to disable search in apache?
or do you mean some other search?
edit: problem seems to be fixed.

There are times when it IS appropriate to use (.*)

is it remotely appropriate in my case?

but most people who use (.*) don’t even bother to prevent loops

making a website is easy people say, making a good website is hard i say.

there are thousands of small details that need to be looked at i tell my friends and family, and they reply, like what? i am usually left speechless after that, how am i going to explain that websites have htaccess, mod_rewrite, php, http requests, and so on…

“if it ain’t broke, don’t fix it!”

i’m not fixing, i’m optimizing :stuck_out_tongue:

maybe i do take this all to seriously, need to learn to relax.

Is that a good thing?

he is a highly successful rich man, what do you think?

Yes, your comments about others leaving comments are good - but my tutorial is really a compilation of Q&A from my first year as a Mentor and not intended to start arguments, flame wars, answer questions (outside SitePoint), etc.

hahah, i was expecting that.

I have an innate tendency to tell how the watch is built rather than what time it is!

simply lol

Try the {REQUEST_FILENAME} - and please report back whether that resolves this weird problem!

hahah, that did fix the problem!!! :smiley:

it seems that when you use REQUEST_URI it doesn’t index the files in the directories, and when you use REQUEST_FILENAME, the server does.

i knew that .htaccess was powerful, but it just keeps amazing me.

now i’m in a weird position,
should i keep the

RewriteCond &#37;{REQUEST_FILENAME} !(\\/|\\.html|\\.htm|\\.php|\\.css|\\.js|\\.jpg|\\.jpeg|\\.png|\\.gif|\\.flv|\\.swf)$

because everything SEEMS to work fine without it.

i hate listing thins, as when i suddenly decide to use a new format i don’t want to have to remember that i have to edit .htaccess to allow that.
hahahah, i’m still so happy that it works!!

Hi Yuri!

Okay, it seems as if the problem is resolved so I’ll go through your questions quickly:

  1. If you want to redirect EVERYTHING (or NOTHING) with the caveat that it’s not an existing directory or file, (.*) will help you do just that. DON’T use it without the escape hatch provided by the RewriteCond’s, though!

([0-9a-zA-Z_\ -\.]+) - the space (the '\ ’ in your atom’s range definition) is quite correct - assuming you want to see %20’s in your URL. I don’t know of ANYONE who wants that! The dot character is NOT a metacharacter within a range definition so it is not escaped (the red \).

I have no idea how to disable a “search” in Apache - I was asking what the heck you were talking about!

Thanks for the feedback on the {REQUEST_FILENAME}.

RewriteCond %{REQUEST_FILENAME} !([COLOR=Red]\\[/COLOR]/|\\.html|\\.htm|\\.php|\\.css|\\.js|\\.jpg|\\.jpeg|\\.png|\\.gif|\\.flv|\\.swf)$

Again, the trailing / character does not (ever) need to be escaped.

“If it ain’t broke, don’t fix it.” Translated: If you don’t need it, don’t add it.

Okay, chalk-up another Happy Camper! :Partier:

Regards,

DK

assuming you want to see %20’s in your URL. I don’t know of ANYONE who wants that!

its merely for the point of technical error avoidance, aka websites ability to respond to errors that users will make.

none of my links contain %20
note: but it seems that pretty soon it will be considered normal.

I have no idea how to disable a “search” in Apache - I was asking what the heck you were talking about!

oh sorry, i’m a xp tech guy, i fix pc’s for a bunch of people, i fix microsoft windows’s, i sometimes expect people to know too much, basically so you know, to enable the searching ability of files that are not known by windows (eg. .php) you need to remove the indexing for that hard drive.

#1 - Go to your start menu and click Search.
#2 - Click on “Change preferences” and then click “With Indexing Service (for faster local searches)”.
#3 - Click “Change Indexing Service Settings (Advanced)”. You do not have to turn on Index service.
#4 - On your toolbar click the “Show/Hide Console Tree” button.
#5 - Next, in the left window, right click over “Indexing Service on Local Machine” and select properties.
#6 - From the Generation tab select the “Index files with unknown extensions” check box and press OK.
#7 - Close the Indexing Service controls and try searching for a line of code you know is inside a .php file.

THAT IS what i was talking about

Thanks for the feedback on the {REQUEST_FILENAME}.

your welcome, always learning i see.

Again, the trailing / character does not (ever) need to be escaped.

it seems it will take me time to remember that

“If it ain’t broke, don’t fix it.” Translated: If you don’t need it, don’t add it.

so it shall be

meet you again on sitepoint or somewhere else :stuck_out_tongue:
and a huge thanks again! now go, save someone else. :smiley:

:tup:

Regards,

DK