This article will lead you through rewrite rules, regular expressions, and rewrite conditions, and provide a great list of examples.
First off, I’m going to assume that you understand the common reasons for wanting a URI rewriting feature for your web site. If you’d like information about this field, there’s a good primer in the SitePoint article, mod_rewrite: A Beginner’s Guide to URL Rewriting. There, you’ll also find instructions on how to enable it on your own server.
Testing Your Server Setup
Some hosts do not have mod_rewrite enabled (by default it is not enabled). You can find out if your server has mod_rewrite enabled by creating a PHP script with one simple line of PHP code:
phpinfo();
If you load the script with a browser, look in the Apache Modules section. If mod_rewrite isn’t listed there, you’ll have to ask your host to enable it — or find a “good host”. Most hosts will have it enabled, so you’ll be good to go.
The Magic of mod_rewrite
Here’s a simple example for you: create three text files named test.html
, test.php
, and .htaccess
.
In the test.html
file, enter the following:
<h1>This is the HTML file.</h1>
In the test.php
file, add this:
<h1>This is the PHP file.</h1>
Create the third file, .htaccess
, with the following:
RewriteEngine on
RewriteRule ^/?test.html$ test.php [L]
Upload all three files (in ASCII mode) to a directory on your server, and type:
http://www.example.com/path/to/test.html
into the location box — using your own domain and directory path of course! If the page shows “This is the PHP file”, it’s working properly! If it shows “This is the HTML file,” something’s gone wrong.
If your test worked, you’ll notice that the test.html
URI has remained in the browser’s location box, yet we’ve seen the contents of the test.php
file. You’ve just witnessed the magic of mod_rewrite!
mod-rewrite Regular Expressions
Now we can begin rewriting your URIs! Let’s imagine we have a web site that displays city information. The city is selected via the URI like this:
http://www.example.com/display.php?country=USA&state=California&city=San_Diego
Our problem is that this is way too long an unfriendly to users. We’d much prefer it if visitors could use:
http://www.example.com/USA/California/San_Diego
We need to be able to tell Apache to rewrite the latter URI into the former. In order for the display.php
script to read and parse the query string, we’ll need to use regular expressions to tell mod_rewrite how to match the two URIs. If you’re not familiar with regular expressions (regex), many sites provide excellent tutorials. At the end of this article, I’ve listed the best pages I’ve found on the topic. If you’re not able to follow my explanations, I recommend reviewing the first two of those links.
A very common approach is to use the expression (.*)
. This expression combines two metacharacters: the dot character, which means ANY character, and the asterisk character, which specifies zero or more of the preceding character. Thus, (.*)
matches everything in the {REQUEST_URI}
string. {REQUEST_URI}
is that part of the URI which follows the domain up to but not including the ?
character of a query string, and is the only Apache variable that a rewrite rule attempts to match.
Wrapping the expression in brackets stores it in an “atom,” which is a variable that allows the matched characters to be reused within the rule. Thus, the expression above would store USA/California/San_Diego in the atom. To solve our problem, we’d need three of these atoms, separated by the subdirectory slashes (/
), so the regex would become:
(.*)/(.*)/(.*)
Given the above expression, the regex engine will match (and save) three values separated by two slashes anywhere in the {REQUEST_URI}
string. To solve our specific problem, though, we’ll need to restrict this somewhat — after all, the first and last atoms above could match anything!
To begin with, we can add the start and end anchor characters. The ^
character matches matching characters at the start of a string, and the $
character matches characters at the end of a string.
^(.*)/(.*)/(.*)$
This expression specifies that the whole string must be matched by our regex; there cannot be anything else before or after it.
However, this approach still allows too many matches. We’re storing our matches as atoms, and will be passing them to a query string, so we have to be able to trust what we match. Matching anything with (.*)
is too much of a potential security hazard, and, when used inappropriately, could even cause mod_rewrite to get stuck in a loop!
To avoid unnecessary problems, let’s change the atoms to specify precisely the characters that we will allow. Because the atoms represent location names, we should limit the matched characters to upper and lowercase letters from A to Z, and because we use it to represent spaces in the name, the underscore character (_
) should also be allowed. We specify a set using square brackets, and a range using the -
character. So the set of allowed characters is written as [a-zA-Z_]
. And because we want to avoid matching blank names, we add the +
metacharacter, which specifies a match only on one or more of the preceding character. Thus, our regex is now:
^([a-zA-Z_]+)/([a-zA-Z_]+)/([a-zA-Z_]+)$
The {REQUEST_URI}
string starts with a /
character. Apache changed regex engines when it changed versions, so Apache version 1 requires the leading slash while Apache 2 forbids it! We can satisfy both versions by making the leading slash optional with the expression ^/?
(?
is the metacharacter for zero or one of the preceding character). So now we have:
^/?([a-zA-Z_]+)/([a-zA-Z_]+)/([a-zA-Z_]+)$
With regex in hand, we can now map the atoms to the query string:
display.php?country=$1&state=$2&city=$3
$1
is the first (country) atom,$2
is the second (state) atom and$3
is the third (city) atom. Note that there can only be nine atoms created, in the order in which the opening brackets appear --$1 ... $9
in a regular expression.We're almost there! Create a new
.htaccess
file with the text:
RewriteRule ^/?([a-zA-Z_]+)/([a-zA-Z_]+)/([a-zA-Z_]+)$ display.php?country=$1&state=$2&city=$3 [L]
Save this to the directory in which display.php
resides. The rewrite rule must go on one line with one space between the RewriteRule
statement, the regex, and the redirection (and before any optional flags). We’ve used the [L]
, or ‘last’ flag, which is the terminating flag (more on flags later).
Our rewrite rule is now complete! The atom values are being extracted from the request string and added to the query string of our rewritten URI. The display.php
script will likely extract these values from the query string and use them in a database query or something similar.
If, however, you have only a short list of allowable countries, it might be best to avoid potential database problems by specifying the acceptable values within the regex. Here’s an example:
^/?(USA|Canada|Mexico)/([a-zA-Z_]+)/([a-zA-Z_]+)$
If you’re concerned about capitalization because the values in your database are strictly lowercase, you can make the regex engine ignore the case by adding the No Case flag, [NC]
, after the rewritten URI. Just don’t forget to convert the values to lowercase in your script after you obtain the $_GET
array.
If you want to use numbers (0, 1, … 9) for, say, Congressional Districts, then you’ll need to change an atom’s specification from ([a-zA-Z_]+
) to ([0-9]
) to match a single digit, ([0-9]{1,2}
) to match one or two digits (0 through 99), or ([0-9]+
) for one or more digits, which is useful for database IDs.
The RewriteCond
Statement
Now that you’ve learned how to use mod_rewrite’s basic RewriteRule
statement with the {REQUEST_URI}
string, it’s time to see how we can use conditionals to access other variables with the RewriteCond
statement. The RewriteCond
statement is used to specify the conditions under which a RewriteRule
statement should be applied.
RewriteCond
is similar in format toRewriteRule
in that you have the command name,RewriteCond
, a variable to be matched, the regex, and flags. The logical OR flag,[OR]
, is a useful flag to remember because allRewriteCond
andRewriteRule
statements are inclusive, in the sense of a logical AND relationship, until terminated by the Last,[L]
, flag.You can test many server variables with a
RewriteCond
statement. You can find a list in the SitePoint article I mentioned previously, but this is the best list of server variables I've found.As an example, let's assume that we want to force the www in your domain name. To do this, you'll need to test the Apache
{HTTP_HOST}
variable to see if the www. is already there and, if it's not, redirect to the desired host name:
RewriteCond %{HTTP_HOST} !^www.example.com$ [NC]
RewriteRule .? http://www.example.com%{REQUEST_URI} [R=301,L]
Here, to denote that {HTTP_HOST}
is an Apache variable, we must prepend a %
character to it. The regex begins with the !
character, which will cause the condition to be true if it doesn’t match the pattern. We also have to escape the dot character so that it matches a literal dot and not any character, as is the case with the dot metacharacter. We’ve also added the No Case flag to make this operation case-insensitive.
The RewriteRule
will match zero or one of any character, and will redirect to http://www.example.com
plus the original {REQUEST_URI}
value. The R=301
, or redirect, flag will cause Apache to issue a HTTP 301 response, which indicates that this is a permanent redirection; the Last flag tells mod_rewrite that you’ve completed this block statement.
RewriteCond
statements can also create atoms, but these are denoted with %1 ... %9
in the same way that RewriteRule
atoms are denoted with $1 ... $9
. You’ll see these atom variables in operation in the examples later on.
Frequently Asked Questions (FAQs) about Apache Mod_Rewrite
What is Apache Mod_Rewrite and why is it important?
Apache Mod_Rewrite is a module used in the Apache web server software for rewriting URL requests. It is a powerful tool that allows you to manipulate URLs on the fly, making your site more user and search engine friendly. It can also help improve site security by blocking specific types of requests.
How do I enable Mod_Rewrite in Apache?
To enable Mod_Rewrite in Apache, you need to first ensure that the module is installed. This can be done by checking the httpd.conf file for the line ‘LoadModule rewrite_module modules/mod_rewrite.so’. If it’s there, the module is installed. To enable it, you need to find the line ‘#LoadModule rewrite_module modules/mod_rewrite.so’ and remove the ‘#’ at the beginning. Then, restart your Apache server for the changes to take effect.
What is the syntax for creating rewrite rules?
The syntax for creating rewrite rules in Apache Mod_Rewrite consists of two main parts: the RewriteRule directive and the pattern and substitution. The RewriteRule directive tells Apache that this line is a rewrite rule. The pattern is what you want to match in the incoming URL, and the substitution is what you want to replace it with.
How can I debug my rewrite rules?
Debugging rewrite rules can be a bit tricky, but Apache provides a tool called ‘RewriteLog’ that can help. By setting ‘RewriteLogLevel’ to a higher value, you can get more detailed information about how your rules are being processed. This can help you identify any issues or mistakes in your rules.
Can I use regular expressions in rewrite rules?
Yes, you can use regular expressions in your rewrite rules. This allows you to create more complex and flexible rules. For example, you could create a rule that matches any URL that contains a certain word or phrase, regardless of where it appears in the URL.
What is the difference between a rewrite rule and a redirect?
A rewrite rule changes the URL that the server processes without the client’s knowledge, while a redirect sends a response back to the client telling it to make a new request at a different URL. Both can be useful, but they serve different purposes and should be used in different situations.
How can I prevent a rewrite rule from being applied multiple times?
You can prevent a rewrite rule from being applied multiple times by using the ‘L’ flag, which stands for ‘last’. When Apache sees this flag, it will stop processing any further rules for the current URL.
Can I use Mod_Rewrite to improve my site’s SEO?
Yes, you can use Mod_Rewrite to improve your site’s SEO. By creating more descriptive and user-friendly URLs, you can make your site more appealing to search engines. This can help improve your site’s ranking in search engine results.
What is the difference between ‘REQUEST_FILENAME’ and ‘REQUEST_URI’ in Apache configuration?
REQUEST_FILENAME’ and ‘REQUEST_URI’ are both variables that can be used in rewrite rules. ‘REQUEST_FILENAME’ contains the full filesystem path to the file or script that is being accessed, while ‘REQUEST_URI’ contains the part of the URL after the domain name.
How can I block specific types of requests with Mod_Rewrite?
You can block specific types of requests by creating a rewrite rule that matches the type of request you want to block and then using the ‘F’ flag, which stands for ‘forbidden’. This will cause Apache to return a 403 Forbidden response for any matching requests.
DK Lynn is a former instructor pilot and "rocket scientist" now living in New Zealand where he operates a small business developing and hosting web sites.