Disguising URLs Without Duplicating Content

SitePoint Members,
Does anyone know how to rewrite a url in htaccess such that:
I have a web page with file name page.html. When a visitor chooses to see what page.html has to say page.html will be disguised as page/ in such a way that search engines won’t think there’s two separate pages (page.html and page/) and will think the page that exists is page/ .

Thanks for any help,

Chris

Why not just have “page/” ?

Don’t call the file HTML – or store the values in some other format like say… in a database.

My .htaccess generally routes ALL non-image/object file requests to a single central index.php, that decides what should be shown. I then just parse $_SERVER[‘REQUEST_URI’] to see what the user requested for a page. The index.php sets up all the values, outputs proper headers, loads the stuff before the content that’s the same on every page, shows the content, then dumps out all the stuff after the content that’s the same.


RewriteEngine On
RewriteRule !\\.(gif|jpg|png|css|js|ico|zip|rar|pdf|xml|mp4|mpg|flv|swf|mkv|ogg|avi|eot|woff|ttf|svg)$ index.php

Basically, any file request that doesn’t end in the above extensions, I shove at index.php which then does all my grunt-work. Means only one user entry point, which means a unified single security point.

Why not skip all those extensions and only pass those resources that do not exist…? Then you won’t need that long-a** string of extensions.


# Point all non-existing files/folders to index.php
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule .? index.php [QSA,L] # use $_SERVER['REQUEST_URI']

Or just use fallbackResource? (if Apache 2.2.16+)

Because then existing .html or .php files could actually be called… and I don’t want ANY php file OTHER than index called. It’s called library security; same reason properly written library files shouldn’t output anything as all calls should be wrapped in functions?

I may have files with extensions in there it’s FINE for index.php to call… like /libraries/db.php or /dbSettings.php – that doesn’t mean apache has ANY business sending those files client-side. I do not WANT .php, .html, or any other extension not on the ‘approved list’ to even be available.

… as to the ‘huge ass list’ – It’s actually no bigger than the code you posted; in fact, it’s SMALLER. (129 bytes vs. 136 bytes – assuming you strip the comment from yours).

But more importantly, I can whitelist filetypes at a whim, omitting the ones I want blacklisted.

Though it’s funny you mention it, my thumbnail auto-generator uses a similar .htaccess redirector – if a thumbnail size 404’s, the generator php is called to see if there’s a “full size” version available, if so, it makes the requested size (if it’s an ‘approved’ size).

What would google’s search engines think if I were to change my current pages from /page/ to /page ? Maybe that would aleave changing a file to something that looks like a folder that really is a file (something CMSs do for some reason).

Thanks,

Chris

Let me guess, you are throwing everything into the root of the web directory? Much easier ways to do this. Either way, if you cared for security you wouldn’t have files you don’t want to be accessed in the public web directory. Much less try and use pseudo-security.

Now if you actually put your resources images, css, js into folder structure say store. Could do the following:
(ex: /store/css /store/image /store/script etc )


RewriteCond %{REQUEST_URI} !^/store
RewriteRule .? index.php [QSA,L]

Btw, I never said it was smaller, if you like trying to maintain a giant ass of extensions then be my guest. But you should not have stuff you don’t want accessed in the public structure to begin with.

agreed, the only php files that should be under the site root are front-end controllers. All src files should be kept outside the site root.

I find it funny when people call tried and true methods from two decades ago “psuedo-security”, and or have never heard of them… but then given the way things are pwned these days I should hardly be shocked.

No, I’m not placing them in the root – the approach you suggest involved blacklisting on a per directory bases. Remember, .htaccess inherits to subdirectories… so by using the one I am I can ‘set it and forget it’.

You’re basically talking about taking something simple (only open these files in this directory and any subdirectories) and making it needlessly complex.

As to:

So what are you saying, up-tree link it above www where php doesn’t even HAVE permission to do anything? Oh yeah, GREAT improvement. EVERY MAJOR CMS has files in subdirectories that the end user shouldn’t be able to call – my approach gets rid of their ability to call those files by whitelisting. WHAT’S SO HARD ABOUT THAT?!?

You’re talking using needlessly complex and convoluted files all over the place… Miserable /FAIL/ IMHO.

When the “giant ass list” is SMALLER than the code you posted, HOW does calling it a giant ass list even apply in the first place? Lemme guess, the same way people call Opera “bloated” when it does more than any other browser out of the box and still maintains a smaller distribution and memory footprint. (once you realize FF “lies” about it’s memory use by hiding it in virtual memory instead of private working memory?)… Yeah, more features in the same executable size… that’s bloated, sure it is.

I swear, do people just WANT this stuff to be needlessly convoluted and overcomplicated or something?!?

Oh wait, people still program in C or Ruby… so yeah… needlessly cryptic and convoluted for the win… I swear the only reason half this stuff is as “difficult” as it is being that it’s so by design to keep out the “normals”.

Then you are wrong, what you are doing is not tried and true methods. Keeping files out of the public web space is the tried and true method. Really it not that hard, convoluted or complex. Here let me break it down for you.


website-directory/
  public/ <-- web accessible
  library/ <-- not-web accessible

My god that is convoluted! I think my brain exploded. Then all you need is “fallbackResource” for Apache, none of that mod_rewrite crap. And if your server is properly setup, which I doubt. It would only serve files you wanted served. But it still better to keep those files you DO NOT want served off the server, or out of the public web space.

My directory structure:


website-directory/
  public/
    store/ <-- images, css, resources
    index.php
  system/ <-- actual website functionality.
    application/
    .../

Hardly complex or convoluted. index.php is the only file in public aside from resources.