301 rewrite rule

Hi,

I was hoping to get some help with a redirect. I moved a website from wordpress to a new CMS system. Now, I need to somehow account for backlinks that link to the ‘old’ file structure.

Here are a few examples.

old: 2010/12/09/terrence-smith-is-a-freak-and-other-things/
new: blog/terrence_smith_is_freak_and_other_things

old: 2010/08/28/dogs-land-another-top-20-recruit-for-2010/
new: blog/dogs_land_another_top_20_recruit_for_2010

old: 2010/08/16/a-look-into-the-future-2011-class/
new: blog/a_look_into_the_future_2011_class

So, essentially I need a rule that says replace the first three folders with a folder called ‘blog’ and then for the url title replace all ‘-’ with ‘_’ …

Is something like that possible?

Best
Florian

Yes, it’s doable.

Do you know any regular expressions?

You’ll want to read through this page: http://datakoncepts.com/seo

It should answer most of the most basic, common rewriting questions. You are running Apache? You know which version? The article assumes you are editing the .htaccess file, which most people do if they have shared hosting. If you have your very own server then you may just have a direct config file.

If you try something based on that article and it doesn’t work and you don’t know why, post back with the code you tried and any errors you got (if any).

Probably instead of using mod_rewrite, you’ll want RedirectMatch. But they’re very similar in how you write them and both use regex which you need for your type of redirect.

thank you for the link and your reply.
I browsed through the section in the lint that you sent. this is pretty much beyond me. I do not know regular expressions.

My server is vps, running on apache.

I need a way to account for the old links… example
old: 2010/12/09/terrence-smith-is-a-freak-and-other-things/
new: blog/terrence_smith_is_freak_and_other_things

{snip}

best
Florian

Florian,

You are NOT permitted to solicit for a quote in these boards.

That article was built from similar requests here and the regex section is simple to follow as it develops correct regex.

While I understand your desire to short circuit the learning process to get your old links back online, I do not believe that a webmaster should use code he/she does not understand. Therefore, IMHO, it would be FAR better for you to make a quick attempt and let me lead you to the correct solution - it IS within your capability!

Regards,

DK

I apologize about asking for a quote. I didnt realize it wasnt allowed on this forum. Sorry about that.

I am trying to fight through the documentation. Keep in mind, I am a web designer, not a webmaster, so for me all of this is completely new.

Maybe the following could be a solution to conver the “-” to “_” …


RewriteEngine on
RewriteRule ^(.*)-(.*)$ $1_$2 [N,L]

I didnt see anything about getting rid of sub-directories. I need to get rid of the date folders and instead show the folder called ‘blog’.

Florian,

Oh, my! The noobie’s :kaioken: EVERYTHING :kaioken: atom. Obviously, you didn’t read the tutorial article.

What your code is doing is merely replacing -'s with _'s. Why would you even think to show that?

Okay, my process is to start by creating a “specification.” In your case, it’s to match then ignore digits and /'s (pattern of ####/##/##/) until you get to the lowercase letters and -'s, capture those but match (to ignore) a trailing /. The regex for that is simply ^\d{4}/\d{2}/\d{2}/([-a-z]+)/. The redirection is obviously blog/$1. Can you get the rest of the way with all that done for you?

Regards,

DK

What your code is doing is merely replacing -'s with _'s. Why would you even think to show that?

? Because that’s the part he tried: just the -'s to _'s part.

I suppose it would be good to use one of those online Javascripted things where you type in a regex and then some pattern and see if your regex works (I remember seeing such a thing somewhere).

Florian, the specification DK gave you was for the date-to-blog part. Does that makes sense? It’s looking for a certain pattern of numbers and once it matches that pattern, it replaces it with the blog/ folder and tacks everything else onto the end.

Also, DK’s tutorial I linked to is very angry and bitter and frothing about the (.*) stuff. It’s dangerous because it catches everything… and nothing, and makes Evil Loops. Which you never want except your Wordpress got away with it (if you were looking at that) because Wordpress wants to grab everything and nothing and just shove it over to a PHP script which deals with it and takes care of loops with its own logic.
Now, you’re going to let Apache do everything. Apache’s very good at this. So your rewrite rules will be as specific as possible. Don’t try to match everything/nothing. If all your links follow the same pattern, try to match that pattern as closely as possible.

Regards,

DK

Mallory, that has NOTHING to do with his “specification:”

Am I getting old in my blind age?

old: 2010/12/09/terrence-smith-is-a-freak-and-other-things/
new: blog/terrence_smith_is_freak_and_other_things

[/quote]
Better to practise learning regex on someone’s script than practise on your running server!
I had the luxury of practising on a server, because it was just localhost fun.

Mallory,

:eek2: My glasses are failing me!

With your correction, obviously, there are two parts to this problem and he DID hit on half!

Regards,

DK

Hm, I was thinking more about this


RewriteEngine on
RewriteRule ^(.*)-(.*)$ $1_$2 [N,L]

Even assuming the boobies are replaced with something stricter, there’s still an issue:
either the L has to go so that can be looped through for every - that comes along in the URL… loop loop loop limit reached
or one long one with like 9 atoms has to be written, which
-limits the number of words in the title to 9
-looks gross

I’m wondering if sending those urls to a script with a regular search-replace regex and spitting those back to the browser is a better idea?

Mallory,

Naw, you’re quite correct about the Last flag. The Next flag has mod_rewrite restart the pass with the redirection and should precede the date => blog redirection (with _'s rather than -'s).

Regards,

DK

Hi Guys

So I checked if the mod_rewrite is enablel this morning and uploaded a checkphp.php file. This came back ok.

I then followed the tutorial and created the test.html and test.php as well as the .htaccess file. This worked as it should, my test.html was redirected to test.php

I think I understand the first part

^\d{4}/\d{2}/\d{2}/([-a-z]+)/.

But not sure about

([-a-z]+)/.

I see it mentioned in the tutorial but cant figure out what it does. It doesnt look like this code already converts the - to _ right? Would the code snipped that I suggested work for that?

and how does

blog/$1
fit into the equation?

i did read the tutorial, but I cant say that I now understand this and know how to write perfect regex code.

I think I understand the first part
Quote:
^\d{4}/\d{2}/\d{2}/([-a-z]+)/
But not sure about
([-a-z]+)/

\d is a digit, and you have a set of digits at the start of your url (the date) and then you’ve got stuff after it which is lowercase letters and -'s.
2010/12/09/terrence-smith-is-a-freak-and-other-things

[-a-z]+ says one or more of either a - or a lowercase letter.

It doesnt look like this code already converts the - to _ right?

No, it’s matching urls that start with your number pattern that are also followed by -'s and lowercase letters. That means that regex will not match 2010/12/09/ by itself.

It seems best to first do the -'s to _'s part, then send the newly modified url (with _'s) to the part that checks if it starts with a date and needs to get rewritten to blog/

Would the code snipped that I suggested work for that?

The one with the angry-flaming-boobies? (.*)? No. Well, it will catch your urls but it might also catch other stuff you DON’T want. Instead of everything-nothing catchers, you’ll want similar to how DK did the -'s/lowercase letters: state what exactly may be around those -'s.
([a-z]+)-([a-z]+) is one example. If you have numbers in those titles, you’ll need to use numbers. If you have uppercase letters somewhere, you’ll need A-Z. You look at your current urls to see what might possibly be in those titles and add those into your atoms () and nothing else.

From DK’s page:

Too many people just use the (.*) to select (NOTHING OR) EVERYTHING in an “atom” (an Apache variable you can create and use within mod_rewrite) and try to pass that along to the redirection string. In this case, you would need three of these atoms separated by the subdirectory slashes (“/”) so the regex would become:

(.)/(.)/(.*)

Note #1: (.) combines two metacharacters, the dot character (which means ANY character) and the * character (which specifies ZERO or MORE of the preceding character) within an atom (Apache variable created by mod_rewrite). Thus, (.) matches EVERYTHING in the {REQUEST_URI} string ({REQUEST_URI} is that part of the URL which follows the domain up to but not including the ? of a query string and is the ONLY Apache variable that a RewriteRule can attempt to match). With the above regex, the regex engine will progress to learn that you have required two slashes (anywhere) in the string. For our purposes, though, we need to capture the three values in the {REQUEST_URI} so I’ve used the slashes to separate them.

You can learn the really basic regexen pretty quickly… it’s the complicated stuff that takes more time. I like http://www.regular-expressions.info/reference.html

and how does
blog/$1
fit into the equation?

With the regex in hand, you can now map the atoms to the query string:

display.php?country=$1&state=$2&city=$3

where display.php is the name of the script, $1 is the first (country) atom, $2 is the second (state) atom and $3 is third (city) atom. Note that there can only be nine atoms created, $1 … $9 (the tenth, $0, is the entire target string, the {REQUEST_URI}).

So when you put stuff inside ()'s, you are giving the regex engine the opportunity to remember whatever matched that particular pattern, which is called an atom for some reason.

Apache has special variables you can use to call that stuff back.

So DK’s regex started catching your numbers and then had the everything-else-that-follows caught by b[/b], whatever matched that (in your case, it would be [_a-z]+ and the title with _'s) is remembered. If it’s the first set of ()'s then you can call it back with $1.
that way you can tell the browser to go to blog/(whateverwascaught) using $1.

In other programs there are similar variables that call back captured stuff.

now that makes much more sense to me. i was wondering what in the world the ‘atom’ was :slight_smile:

Let me try to parcel this all together and post the code back here for review.

thank you very much for explaining this to me in a basic way.

Florian,

An “atom” is the enclosed parenthetical which CREATES an Apache variable, a VERY valuable thing which empowers mod_rewrite. In other words, RewriteRule ^(.)$ index.php?url=$1 [L] will create an atom with the (.) which captures EVERYTHING between the start anchor, the ^, and the end anchor, the $ (the anchors are merely the start or end of the {REQUEST_URI} string). This atom is then referenced in the redirection as $1.

GREAT! Your test to verify the proper function of mod_rewrite is a major step in your learning!

^\d{4}/\d{2}/\d{2}/([-a-z]+)/ does not convert - to _ as I had missed that in the original post (Mallory corrected me later).

The \d is a shortcut for [0-9], in other words, a single digit (in the range from 0 to 9). The {4} says that I want match exactly four of the previous character, the {2}'s, that I want to match exactly two of the previous character. Had I use {,4}, {4,} or {4,6}, I would have required no more than 4, 4 or more or 4 to 6 of the previous character.

Yes, with the exception of the Last flag (for other reasons), your code snippet would have been correct. The only “trick” would have been in combining the two parts of your “specification” (convert all -'s to _'s and replace the date with blog).

“Flaming boobies?” :lol: I thought you liked “dot star?” Oh, well, funny! Thanks!

Actually, the (.*) is necessary in the - to _ segment to catch the digits and slashes, i.e., leave the URI in tact while converting one character at a time.

In summary,

RewriteEngine on
RewriteRule ^(.*)-(.*)$ $1_$2 [N]
RewriteRule ^\\d{4}/\\d{2}/\\d{2}/([a-z_]+)/ blog/$1 [R=301,L]

Because the - to _ is repeated so often, it’s better to use the first bit of code. The Next flag is used to tell mod_rewrite to start the next pass through the mod_rewrite code, the R=301 is used to let SE’s and your visitors know that the link has been redirected and the Last flag is to perform the redirection before handling any other mod_rewrite rules you have in your .htaccess.

Regards,

DK

DK beat me to it. I was going to try to combine the stuff that we have talked about into one rule. However, I didnt know that I had to start each line with ‘RewriteRule’.

I will try out this code and see what it does for me.

This was definitely a learning experience! Next time I have to move a site to a new server and new folder structure, I at least know what to expect :slight_smile:

Florian,

Did you understand the code and the flags? To me, that’s the IMPORTANT part of this exercise.

Regards,

DK

I feel like I get most of it. Stomme explained the ([-a-z]+) really nicely. And also what the $1 means. the L flag tells it to stop it and the N flag tells it not to go through again but to start over at that point.

Is there anything else that I have to do with the htaccess or otherwise? the page is blank and the url doenst get redirected on the current code that I am using.