I think I understand the first part
Quote:
^\d{4}/\d{2}/\d{2}/([-a-z]+)/
But not sure about
([-a-z]+)/
\d is a digit, and you have a set of digits at the start of your url (the date) and then you’ve got stuff after it which is lowercase letters and -'s.
2010/12/09/terrence-smith-is-a-freak-and-other-things
[-a-z]+ says one or more of either a - or a lowercase letter.
It doesnt look like this code already converts the - to _ right?
No, it’s matching urls that start with your number pattern that are also followed by -'s and lowercase letters. That means that regex will not match 2010/12/09/ by itself.
It seems best to first do the -'s to _'s part, then send the newly modified url (with _'s) to the part that checks if it starts with a date and needs to get rewritten to blog/
Would the code snipped that I suggested work for that?
The one with the angry-flaming-boobies? (.*)? No. Well, it will catch your urls but it might also catch other stuff you DON’T want. Instead of everything-nothing catchers, you’ll want similar to how DK did the -'s/lowercase letters: state what exactly may be around those -'s.
([a-z]+)-([a-z]+) is one example. If you have numbers in those titles, you’ll need to use numbers. If you have uppercase letters somewhere, you’ll need A-Z. You look at your current urls to see what might possibly be in those titles and add those into your atoms () and nothing else.
From DK’s page:
Too many people just use the (.*) to select (NOTHING OR) EVERYTHING in an “atom” (an Apache variable you can create and use within mod_rewrite) and try to pass that along to the redirection string. In this case, you would need three of these atoms separated by the subdirectory slashes (“/”) so the regex would become:
(.)/(.)/(.*)
Note #1: (.) combines two metacharacters, the dot character (which means ANY character) and the * character (which specifies ZERO or MORE of the preceding character) within an atom (Apache variable created by mod_rewrite). Thus, (.) matches EVERYTHING in the {REQUEST_URI} string ({REQUEST_URI} is that part of the URL which follows the domain up to but not including the ? of a query string and is the ONLY Apache variable that a RewriteRule can attempt to match). With the above regex, the regex engine will progress to learn that you have required two slashes (anywhere) in the string. For our purposes, though, we need to capture the three values in the {REQUEST_URI} so I’ve used the slashes to separate them.
You can learn the really basic regexen pretty quickly… it’s the complicated stuff that takes more time. I like http://www.regular-expressions.info/reference.html
and how does
blog/$1
fit into the equation?
With the regex in hand, you can now map the atoms to the query string:
display.php?country=$1&state=$2&city=$3
where display.php is the name of the script, $1 is the first (country) atom, $2 is the second (state) atom and $3 is third (city) atom. Note that there can only be nine atoms created, $1 … $9 (the tenth, $0, is the entire target string, the {REQUEST_URI}).
So when you put stuff inside ()'s, you are giving the regex engine the opportunity to remember whatever matched that particular pattern, which is called an atom for some reason.
Apache has special variables you can use to call that stuff back.
So DK’s regex started catching your numbers and then had the everything-else-that-follows caught by b[/b], whatever matched that (in your case, it would be [_a-z]+ and the title with _'s) is remembered. If it’s the first set of ()'s then you can call it back with $1.
that way you can tell the browser to go to blog/(whateverwascaught) using $1.
In other programs there are similar variables that call back captured stuff.