Mod_rewrite help

Hello,

My mod_rewrite skills are very poor, which is why I’m going to have to request help of someone more experience here.

I ended up massively changing my site’s structure, and need to update 301 redirects for old pages (that are still being linked to by google…)

The example is like so:

Had pages that looked like this:
http://www.mysite.com/Gucci-clothes-p-1-c-11.html
http://www.mysite.com/Gucci-clothes-shirts-p-2-c-12.html

Now pages look like this:
http://www.mysite.com/Gucci-p-1-c-11.html
http://www.mysite.com/Gucci-p-2-c-12.html

I’m guessing this is just a case of mass replace, but I’m having trouble building the rewrite statement.

Logically it is something like “%-clothes-%” and “%-clothes-shirts-%” should be replace with just “-”. And I’m guessing you should start with first searching and replacing the longer string of -clothes-shirts-, so that afterwards you can also search/replace the shorter string -clothes- (won’t work backwards).

Anyone?

EDIT: (There are other brands besides Gucci that need the same rewriting of -clothes-shirts- and -clothes-).

df,

The two examples you’ve shown would indicate that it’s easy (trivial) to make a redirect. Please confirm that the important parts of the OLD URIs are the manufacturer (e.g., Gucci, everything before the first -) and from the p- through to the dot for the extension. THEN, (a) you need to create the NEW URIs and the following mod_rewrite can be put in your .htaccess file:

RewriteEngine on
RewriteRule ^([a-zA-Z]+)-.+(-p-\\d+-c-\\d+)\\.html$ $1$2.html[L]

Where ([a-zA-Z]+) is the $1 which, in the example, is Gucci. This must only contain letters. (-p-\d±c-\d+) is the $2 atom which is -p-1-c-11 in your first example and -p-2-c-12 in the second example. Please note that the p and c are hard-coded while the \d+'s specify one or more digits each. If p and/or c need to be (p|q) or [a-z], then you can modify the above code as required.

Regards,

DK

Thank you very much for your response.

Yes, the p- and c- are hardcoded. They are just that page# and category#.

That is almost correct, however (and forgive me for not pointing this out in the example), there can be manufacturers that are dash delimited (more than one word), such as “Dolce-and-Gabbana”.

So, returning to the example we used to have
http://www.mysite.com/Gucci-clothes-shirts-p-2-c-12.html
http://www.mysite.com/Dolce-and-Gabbana-clothes-shirts-p-3-c-13.html

and now we have (and need to redirect the old page requests to)
http://www.mysite.com/Gucci-p-2-c-12.html
http://www.mysite.com/Dolce-and-Gabbana-p-3-c-13.html

One last thing to note, is that product pages were also changed, but they do not have a c- variable. So we also have to deal with the old pages of:
http://www.mysite.com/Gucci-shirt-p-4.html
http://www.mysite.com/Dolce-and-Gabbana-shirt-p-5.html
now changed to
http://www.mysite.com/Gucci-p-4.html
http://www.mysite.com/Dolce-and-Gabbana-p-5.html

As I’m seeing it, it should be a simple “search for match” and replace, where the match that we are looking for is “-clothes-shirts-”, “-clothes-”, “-shirt-” and replacing any of these cases with just “-”.

As far as I searched around about mod_rewrite, I couldn’t find such a “search and replace” function. But I’m guessing it is just my limited knowledge of mod_rewrite’s abilities that lead to this conclusion.

EDIT:

Am I following you? Here is my attempt at it:

For the first:
RewriteRule ^(.+)(-clothes-shirts)(-p-\d±c-\d+)\.html$ $1$3.html[L]

For the second:
RewriteRule ^(.+)(-clothes)(-p-\d±c-\d+)\.html$ $1$3.html[L]

For the third (product page):
RewriteRule ^(.+)(-shirt)(-p-\d+)\.html$ $1$3.html[L]

df,

Okay, the p and c are hardcoded. GREAT!

May I recommend that you replace spaces with 's rather than -'s? Oh, well, instead of the [a-zA-Z]+, you will need to use [-a-zA-Z]+ or [a-zA-Z]+. The reason that you should use _'s (other than that’s a better way to do it) is that you use -'s with your clothes, shirts, shorts and yadda-yadda. In other words, what part of the -yadda-yadda-yadda- do you want Apache to discard?

and now we have (and need to redirect the old page requests to)
http://www.mysite.com/Gucci-p-2-c-12.html
http://www.mysite.com/Dolce-and-Gabbana-p-3-c-13.html

Time out! That violates the basic rule of being able to map from the content of one URI to another. How is Apache supposed to determine Gucci => Gucci from Gucci => Dolce_and_Gabbana?

Old URI’s without the c-? Horrors! You really are changing the specification on me! Oh, well, just enclose the c- in parentheses and make it optional, i.e., (-p-\d±COLOR=“Red”?[/COLOR]\d+).

FWIW, it’s NOT a search and replace function. It’s a matching function with certain portions of the search saved as backreferences to be used in the redirection.

Re your “attempt at it”:

NO! If you hardcode clothes-shirts, you are NOT using the POWER of regex. [-a-z]+ matches clothes-shirts-, clothes- AND yadda-yadda! You have a very powerful tool in your hands so learn how to use it (see my signature’s tutorial if you need a place to start).

Regards,

DK