Learn Apache mod_rewrite: 13 Real-world Examples
Key Takeaways
- Apache mod_rewrite is a powerful module used to rewrite a visitorâs request URI as specified by a set of rules, enhancing site usability, search engine optimization, and security.
- The use of regular expressions in mod_rewrite allows for complex and flexible URL manipulation. For example, by using the expression (.*), which matches any character, and the asterisk character, which specifies zero or more of the preceding character, you can match everything in the {REQUEST_URI} string.
- RewriteCond statements can be used in conjunction with RewriteRule statements to specify the conditions under which a RewriteRule statement should be applied, allowing further customization of URL rewrites.
This article will lead you through rewrite rules, regular expressions, and rewrite conditions, and provide a great list of examples.
First off, Iâm going to assume that you understand the common reasons for wanting a URI rewriting feature for your web site. If youâd like information about this field, thereâs a good primer in the SitePoint article, mod_rewrite: A Beginnerâs Guide to URL Rewriting. There, youâll also find instructions on how to enable it on your own server.
Testing Your Server Setup
Some hosts do not have mod_rewrite enabled (by default it is not enabled). You can find out if your server has mod_rewrite enabled by creating a PHP script with one simple line of PHP code:
phpinfo();
If you load the script with a browser, look in the Apache Modules section. If mod_rewrite isnât listed there, youâll have to ask your host to enable it â or find a âgood hostâ. Most hosts will have it enabled, so youâll be good to go.
The Magic of mod_rewrite
Hereâs a simple example for you: create three text files named test.html
, test.php
, and .htaccess
.
In the test.html
file, enter the following:
<h1>This is the HTML file.</h1>
In the test.php
file, add this:
<h1>This is the PHP file.</h1>
Create the third file, .htaccess
, with the following:
RewriteEngine on
RewriteRule ^/?test.html$ test.php [L]
Upload all three files (in ASCII mode) to a directory on your server, and type:
http://www.example.com/path/to/test.html
into the location box â using your own domain and directory path of course! If the page shows âThis is the PHP fileâ, itâs working properly! If it shows âThis is the HTML file,â somethingâs gone wrong.
If your test worked, youâll notice that the test.html
URI has remained in the browserâs location box, yet weâve seen the contents of the test.php
file. Youâve just witnessed the magic of mod_rewrite!
mod-rewrite Regular Expressions
Now we can begin rewriting your URIs! Letâs imagine we have a web site that displays city information. The city is selected via the URI like this:
http://www.example.com/display.php?country=USA&state=California&city=San_Diego
Our problem is that this is way too long an unfriendly to users. Weâd much prefer it if visitors could use:
http://www.example.com/USA/California/San_Diego
We need to be able to tell Apache to rewrite the latter URI into the former. In order for the display.php
script to read and parse the query string, weâll need to use regular expressions to tell mod_rewrite how to match the two URIs. If youâre not familiar with regular expressions (regex), many sites provide excellent tutorials. At the end of this article, Iâve listed the best pages Iâve found on the topic. If youâre not able to follow my explanations, I recommend reviewing the first two of those links.
A very common approach is to use the expression (.*)
. This expression combines two metacharacters: the dot character, which means ANY character, and the asterisk character, which specifies zero or more of the preceding character. Thus, (.*)
matches everything in the {REQUEST_URI}
string. {REQUEST_URI}
is that part of the URI which follows the domain up to but not including the ?
character of a query string, and is the only Apache variable that a rewrite rule attempts to match.
Wrapping the expression in brackets stores it in an âatom,â which is a variable that allows the matched characters to be reused within the rule. Thus, the expression above would store USA/California/San_Diego in the atom. To solve our problem, weâd need three of these atoms, separated by the subdirectory slashes (/
), so the regex would become:
(.*)/(.*)/(.*)
Given the above expression, the regex engine will match (and save) three values separated by two slashes anywhere in the {REQUEST_URI}
string. To solve our specific problem, though, weâll need to restrict this somewhat â after all, the first and last atoms above could match anything!
To begin with, we can add the start and end anchor characters. The ^
character matches matching characters at the start of a string, and the $
character matches characters at the end of a string.
^(.*)/(.*)/(.*)$
This expression specifies that the whole string must be matched by our regex; there cannot be anything else before or after it.
However, this approach still allows too many matches. Weâre storing our matches as atoms, and will be passing them to a query string, so we have to be able to trust what we match. Matching anything with (.*)
is too much of a potential security hazard, and, when used inappropriately, could even cause mod_rewrite to get stuck in a loop!
To avoid unnecessary problems, letâs change the atoms to specify precisely the characters that we will allow. Because the atoms represent location names, we should limit the matched characters to upper and lowercase letters from A to Z, and because we use it to represent spaces in the name, the underscore character (_
) should also be allowed. We specify a set using square brackets, and a range using the -
character. So the set of allowed characters is written as [a-zA-Z_]
. And because we want to avoid matching blank names, we add the +
metacharacter, which specifies a match only on one or more of the preceding character. Thus, our regex is now:
^([a-zA-Z_]+)/([a-zA-Z_]+)/([a-zA-Z_]+)$
The {REQUEST_URI}
string starts with a /
character. Apache changed regex engines when it changed versions, so Apache version 1 requires the leading slash while Apache 2 forbids it! We can satisfy both versions by making the leading slash optional with the expression ^/?
(?
is the metacharacter for zero or one of the preceding character). So now we have:
^/?([a-zA-Z_]+)/([a-zA-Z_]+)/([a-zA-Z_]+)$
With regex in hand, we can now map the atoms to the query string:
display.php?country=$1&state=$2&city=$3
$1
is the first (country) atom,$2
is the second (state) atom and$3
is the third (city) atom. Note that there can only be nine atoms created, in the order in which the opening brackets appear --$1 ... $9
in a regular expression.We're almost there! Create a new
.htaccess
file with the text:
RewriteRule ^/?([a-zA-Z_]+)/([a-zA-Z_]+)/([a-zA-Z_]+)$ display.php?country=$1&state=$2&city=$3 [L]
Save this to the directory in which display.php
resides. The rewrite rule must go on one line with one space between the RewriteRule
statement, the regex, and the redirection (and before any optional flags). Weâve used the [L]
, or âlastâ flag, which is the terminating flag (more on flags later).
Our rewrite rule is now complete! The atom values are being extracted from the request string and added to the query string of our rewritten URI. The display.php
script will likely extract these values from the query string and use them in a database query or something similar.
If, however, you have only a short list of allowable countries, it might be best to avoid potential database problems by specifying the acceptable values within the regex. Hereâs an example:
^/?(USA|Canada|Mexico)/([a-zA-Z_]+)/([a-zA-Z_]+)$
If youâre concerned about capitalization because the values in your database are strictly lowercase, you can make the regex engine ignore the case by adding the No Case flag, [NC]
, after the rewritten URI. Just donât forget to convert the values to lowercase in your script after you obtain the $_GET
array.
If you want to use numbers (0, 1, ⊠9) for, say, Congressional Districts, then youâll need to change an atomâs specification from ([a-zA-Z_]+
) to ([0-9]
) to match a single digit, ([0-9]{1,2}
) to match one or two digits (0 through 99), or ([0-9]+
) for one or more digits, which is useful for database IDs.
The RewriteCond
Statement
Now that youâve learned how to use mod_rewriteâs basic RewriteRule
statement with the {REQUEST_URI}
string, itâs time to see how we can use conditionals to access other variables with the RewriteCond
statement. The RewriteCond
statement is used to specify the conditions under which a RewriteRule
statement should be applied.
RewriteCond
is similar in format toRewriteRule
in that you have the command name,RewriteCond
, a variable to be matched, the regex, and flags. The logical OR flag,[OR]
, is a useful flag to remember because allRewriteCond
andRewriteRule
statements are inclusive, in the sense of a logical AND relationship, until terminated by the Last,[L]
, flag.You can test many server variables with a
RewriteCond
statement. You can find a list in the SitePoint article I mentioned previously, but this is the best list of server variables I've found.As an example, let's assume that we want to force the www in your domain name. To do this, you'll need to test the Apache
{HTTP_HOST}
variable to see if the www. is already there and, if it's not, redirect to the desired host name:
RewriteCond %{HTTP_HOST} !^www.example.com$ [NC]
RewriteRule .? http://www.example.com%{REQUEST_URI} [R=301,L]
Here, to denote that {HTTP_HOST}
is an Apache variable, we must prepend a %
character to it. The regex begins with the !
character, which will cause the condition to be true if it doesnât match the pattern. We also have to escape the dot character so that it matches a literal dot and not any character, as is the case with the dot metacharacter. Weâve also added the No Case flag to make this operation case-insensitive.
The RewriteRule
will match zero or one of any character, and will redirect to http://www.example.com
plus the original {REQUEST_URI}
value. The R=301
, or redirect, flag will cause Apache to issue a HTTP 301 response, which indicates that this is a permanent redirection; the Last flag tells mod_rewrite that youâve completed this block statement.
RewriteCond
statements can also create atoms, but these are denoted with %1 ... %9
in the same way that RewriteRule
atoms are denoted with $1 ... $9
. Youâll see these atom variables in operation in the examples later on.
Frequently Asked Questions (FAQs) about Apache Mod_Rewrite
What is Apache Mod_Rewrite and why is it important?
Apache Mod_Rewrite is a module used in the Apache web server software for rewriting URL requests. It is a powerful tool that allows you to manipulate URLs on the fly, making your site more user and search engine friendly. It can also help improve site security by blocking specific types of requests.
How do I enable Mod_Rewrite in Apache?
To enable Mod_Rewrite in Apache, you need to first ensure that the module is installed. This can be done by checking the httpd.conf file for the line âLoadModule rewrite_module modules/mod_rewrite.soâ. If itâs there, the module is installed. To enable it, you need to find the line â#LoadModule rewrite_module modules/mod_rewrite.soâ and remove the â#â at the beginning. Then, restart your Apache server for the changes to take effect.
What is the syntax for creating rewrite rules?
The syntax for creating rewrite rules in Apache Mod_Rewrite consists of two main parts: the RewriteRule directive and the pattern and substitution. The RewriteRule directive tells Apache that this line is a rewrite rule. The pattern is what you want to match in the incoming URL, and the substitution is what you want to replace it with.
How can I debug my rewrite rules?
Debugging rewrite rules can be a bit tricky, but Apache provides a tool called âRewriteLogâ that can help. By setting âRewriteLogLevelâ to a higher value, you can get more detailed information about how your rules are being processed. This can help you identify any issues or mistakes in your rules.
Can I use regular expressions in rewrite rules?
Yes, you can use regular expressions in your rewrite rules. This allows you to create more complex and flexible rules. For example, you could create a rule that matches any URL that contains a certain word or phrase, regardless of where it appears in the URL.
What is the difference between a rewrite rule and a redirect?
A rewrite rule changes the URL that the server processes without the clientâs knowledge, while a redirect sends a response back to the client telling it to make a new request at a different URL. Both can be useful, but they serve different purposes and should be used in different situations.
How can I prevent a rewrite rule from being applied multiple times?
You can prevent a rewrite rule from being applied multiple times by using the âLâ flag, which stands for âlastâ. When Apache sees this flag, it will stop processing any further rules for the current URL.
Can I use Mod_Rewrite to improve my siteâs SEO?
Yes, you can use Mod_Rewrite to improve your siteâs SEO. By creating more descriptive and user-friendly URLs, you can make your site more appealing to search engines. This can help improve your siteâs ranking in search engine results.
What is the difference between âREQUEST_FILENAMEâ and âREQUEST_URIâ in Apache configuration?
REQUEST_FILENAMEâ and âREQUEST_URIâ are both variables that can be used in rewrite rules. âREQUEST_FILENAMEâ contains the full filesystem path to the file or script that is being accessed, while âREQUEST_URIâ contains the part of the URL after the domain name.
How can I block specific types of requests with Mod_Rewrite?
You can block specific types of requests by creating a rewrite rule that matches the type of request you want to block and then using the âFâ flag, which stands for âforbiddenâ. This will cause Apache to return a 403 Forbidden response for any matching requests.
DK Lynn is a former instructor pilot and "rocket scientist" now living in New Zealand where he operates a small business developing and hosting web sites.