Unpredictable directory structure mod_rewrite

I am not even sure if this is possible.
I have a gigantic site where about 20,000 of the pages are handled by 1 processing file. It displays different template pages in the body depending on whether it is on the initial page (site.com/initial/) or sub page (this is the tricky one, explained later) and a product page.

Heres the deal, each product has the potential to be on several sub pages. Currently they are determined by a string in the db.

The identifiers for the initial page and the product page are standard, so a url will always begin with ?main=whatever and it will not always end in a product page, but the product is always &number=

All the sub-category page identifiers do not matter. I currently have them setup as the alphabet. Here are some examples.

site.com/main=Parts&a=Ship-Parts&number={whatev product}
site.com/main=Parts&a=Ship-Parts&b=Masts&number={whatev product}
site.com/main=Parts&a=Ship-Parts&b=Masts&c=Blue-Masts&number={whatev product}

The only way I see this being possible currently is to constantly check the last _GET against all 20,000 products. Either that, or write this whole deal a different way.

Is there a path to mod_rewrite this the way I currently have it setup?

dk,

First, WELCOME to SitePoint’s Apache forum!

Second, nice to see another “DK” in the forum!

Third, of course, there is a solution with mod_rewrite (it’s that powerful) but I don’t understand the question. Certainly, your site.com/main=Parts&yadda-yadda is NOT a path/file that can be served. Okay, EVERYTHING goes to a CMS’s index.php file but the “link” is not there. Moreover, because index.php picks and chooses depending upon what was requested, won’t your CMS do that based on the query strings you’ve shown?

Okay, I’m obviously missing something … please help define the problem.

Regards,

DK

heres a better explanation - I screwed up in my prior URLs.

First this isnt a CMS, its a custom php script I wrote xD

Second, WASSSUP FELLOW DK?!!?!?! lol

Third:
This is all scraped data(with permission) and the basic hierarchy is determined by a string in the db on each row like this:
Parts > Ship > Ship Parts > 1982 Mast Blah Blah 20{final product}
or even
Parts > Ship

I set it up to where when you click a link to
site.com/processor.php?main=Parts
it will display the next subcategories within “Parts”, which in this case is Ship, same for the next level of Ship until we are at the end of the heirarchy string, it will show the actual product.

at this point it just popped in my head that the possible solution is possibly a mod_rewrite that just sends anything after site.com/ to processor.php for processing… unless its an existing directory.

Ill continue just in case. The main “issue” Ive been wrestling with is that I have 20,000 of these listings, and I cannot predict the directory depth. Typically when I do a mod_rewrite I copy it from stuff I did before where the directory depth is predictable so it will always rewrite site.com/whatever/ to a category and site.com/whatever/whatevs/ to a main page, product page or whatever.

For this one though, there could be 1 subcategory, or 5 categories prior to the actual product page…

So at this point im thinking maybe I need to send everything after site.com/ to processor.php for processing since I cannot think of a way to guarantee that the last subcategory points to &number={whatev product}…

Currently im doing if($_GET[number] != “”) {show product listing} else {show more subcategories} in processor.php. I may just need to modify my code to instead of checking for $_GET[number] maybe I should make it to check any end of url against product numbers to determine if it should show a product page.

This still leaves the question for me, how do I ensure that everything after site.com/{everything} gets sent to processor.php for processing?

Currently I only know how to rewrite like this
RewriteRule ^([A-Za-z%&0-9-\s]+)/?$ …/video_search.php?c=$1 [L]

site.com/processor.php?main=Parts&a=Ship-Parts&number={whatev product}
site.com/processor.php?main=Parts&a=Ship-Parts&b=Masts&number={whatev product}
site.com/processor.php?main=Parts&a=Ship-Parts&b=Masts&c=Blue-Masts&number={whatev product}

dk,

Custom, okay. That means that you can fiddle with the query string in the script (if necessary).

dk? :eek2:

Scraped, eh?

If you’d read the tutorial linked in my signature, you’d know that (a) you cannot use a space in the URI (it is automatically encoded as %20 which is dirt ugly). If you sent a string with spaces in it to the query string, they’ll be converted to +'s. In other words, you should do something with the spaces before creating links with them (I explained why I prefer _'s).

There are many instances of people trying to match EVERYTHING between /'s in a URI and converting to a redirection and it can be done with a single RewriteRule (using optional atoms). HOWEVER, the best approach for you is to create a series of individual RewriteRules, one for each case. You’ve NEARLY don’t it with the list of examples at the bottom of your post. I’ll give you an example (using a {placeholder} - but use YOUR character range definition instead!);

RewriteEngine on

# case with all five - if Parts is standard, use Parts instead, etc.
RewriteRule ^{atom#1}/{atom#2}/{atom#3}/{atom#4}/{atom#5} processor.php?main=$1&a=$2&b=#3&c=#4&number=$5 [L]

# case with Parts/ & last three optional
RewriteRule ^{atom#1}(/{atom#3}(/{atom#5}(/{atom#7}(/{atom#9})?)?)?)? processor.php?main=$1&a=$3&b=#5&c=#7&number=$9 [L]

If you’re confused with including the / and atom and all trailing /atom combinations in an optional atom, forget that and go back to single statements for each case!

Regards,

DK