Restrict access to files from a certain page/directory?

Couldn’t quite find a specific answer to this. Imagine a web page that has links to some PDF files: mysite.com/folder/page-with-links.html. The files that are linked to are in a directory at mysite.com/files/.

Is it possible—via .htaccess—to prevent anyone viewing files in the file directory unless they do so by clicking a link on the page-with-links.html page? (For example, is there a way to prevent someone accessing a file by entering a URL in their browser like mysite.com/files/file1.pdf?)

Or perhaps there’s a better way. A client has a member-only section of their site that links to sensitive files, and I’m trying to make sure a non-member (such as Google) can’t get hold of a file URL and use it to access files stored in publicly accessible directories.

If the files are web accessible…they are accessible. You need to put them out side of the public web space and use a proxy (like a PHP script) that takes the file and sends it to the client after verifying authentication.

You can password protect individual files. Password protect the “file” using htpasswd.

Hi Ralph,

Sure you can do this.
I have a similar setup on one of my sites - it runs on a CMS and I use a “members only” type plugin to restrict access to certain parts of it.
The plugin however, only restricts access to the database, so I use a .htaccess file to restrict access to anything on the file system in the restricted area (e.g. pdfs)

Here’s the code:

IndexIgnore *
Options +FollowSymlinks
RewriteEngine On
RewriteCond %{HTTP_REFERER} !^http://(www\\.)?mysite.com [NC]
RewriteCond %{REQUEST_URI} !hotlink\\.(gif|png|jpg|doc|xls|pdf|html|htm|xlsx|docx|mp4|m4v|ogv|webm) [NC]
RewriteRule .*\\.(gif|png|jpg|doc|xls|pdf|html|htm|xlsx|docx|mp4|m4v|ogv|webm)$ http://mysite.com/login.php?redirect_to=%{REQUEST_URI} [NC]

If you place this in the directory you are trying to protect, it will route any requests that don’t come directly from your site to your login page.
You will, of course, need to swap out mysite.com with the appropriate name, as well as modifying the list of extensions you need to protect.

Be warned, this code won’t stop a bright spark who spoofs their referer to that of your site, but for me this was an adequate layer of protection.

The code was copied from somewhere, then modified to suit, so I can’t guarantee that it’s watertight, but I hope it helps anyway.

Thanks Pullo. That looks like it should be perfect for this situation. Can I just ask about the redirect to the login page, though: I’m not totally sure how to modify that URL for this site. If the site’s login page is, say, [noparse]http://mysite.com/member/login[/noparse], does that mean this:

http://mysite.com/login.php?redirect_to=%{REQUEST_URI}

should end up at this:

http://mysite.com/member/login?redirect_to=%{REQUEST_URI}

or something else? I’m not sure which bits I need to retain there, if any. (I have tried to read up on this stuff many times, but it doesn’t sink in, I’m afraid!)

After a bit of experimenting, I found that it worked perfectly to change

[noparse]http://mysite.com/login.php?redirect_to=%{REQUEST_URI}[/noparse]

to

[noparse]http://mysite.com/member/login[/noparse]

(which is the site’s login page). Seems to work perfectly. Does this seem like the right way to go, or are there hidden pitfalls?

Edit:

Hm, seemed to work perfectly, for a while, but now not … so I’ll keep experimenting.

Edit:

OK, I think I get it now: the ?redirect_to=%{REQUEST_URI} bit redirects the user to the requested file after login. Seems to be working again now. I tried navigating to the file in the browser and got redirected, but at first, but I was finding that if I clicked a link (say, in an email) to the requested file, it would still load. But now it’s redirecting to the login page. And thus I bumble along … :slight_smile:

HTTP_REFERER Rewrite Conditions can protect the files from blank referrer.

Check whether referer is blank
RewriteCond %{HTTP_REFERER} ^$

Ralph,

You’ve received several good half-answers around the {HTTP_REFERER} variable.

# IN THE protected_folder directory!!!
RewriteEngine on
RewriteCond %{HTTP_REFERER} !page-with-links.html$
RewriteRule .? - [F]
# Fail any attempt to access a file in the protected_folder unless it came from page-with-links.html

Personally, I’d use TWO directories for this with one being the linked directory and the other with the files. The linked directory would only have the .htaccess file which would forward ONLY if the referrer was page-with-links.html (without displaying the real directory) and the protected_with_files directory would have the same code as above.

# Intermediary directory
RewriteEngine on
RewriteCond %{HTTP_REFERER} page-with-links.html$
RewriteRule .? /folder_with_files%{REQUEST_URI} [L]
# Redirect ONLY if the referrer was page-with-links.html
# Without R=301 OR an http://yadda.yadda/link, the actual location of your files will remain hidden.

Having gone through all that in .htaccess files, though, I’d recommend that you protect your pdfs outside your webspace and serve them up via a PHP redirection if your conditions are met (because {HTTP_REFERER} is so unreliable [easily spoofed]). Well, that OR I’d password protect your page-with-links.html file and let it be the “key” to open the convoluted path to the hidden pdf files.

Regards,

DK

Thanks David. I understand bit of that—at least in principle.

One difficulty here is that the page with links—though password protected—is created by a CMS, so isn’t a real directory as such (that is, one that I can access as a directory containing files). The directory containing the files is a normal directory, though.

I’ve considered storing the files outside the web root, but am not sure how to use PHP (along with the CMS) to serve them up. I’ll give your first code block a try, though, as I guess that’s better than nothing. It’s not likely that anyone (like Google) would ever find these files, as there are no public links to them, but any other levels of protection that can be provided are worth adding in. At the moment, the code that Pullo offered certainly gives a significant level of protection that might at least ward off Google, should anyone post a link to the files. I’ll try your code too. :slight_smile:

Hi Ralph,

Glad I could help :slight_smile:

Anyway, I was just wondering what code you ended up with.
Did you modify what I posted?
As I mentioned before, it was copied from elsewhere and altered to suit, so if you managed to improve it in any way, it’d be great if you could share it.

Hi Pullo. No, I didn’t modify it at all, except to remove the redirect at the very end, as it wasn’t needed in this case. If anyone finds a link to these files, I don’t want them going anywhere near them. I might even change the URL to the login page to an offsite URL, or at least a 404 page, if that’s possible, just to send people right off the scent.

[ot]

If you’re feeling nasty, you could send them here: http://www.dokimos.org/ajff/ (warning: may induce epilepsy) :)[/ot]

[ot]

Even though I do have a mean streak, I’m not that cruel. :lol: [/ot]

Hi Ralph!

Well, your password-protected page alleviates the need for the bogus intermediate directory.

As for the blank referrer, you do not want that (because others can open a new tab and directly enter the link by either typing or bookmark) so I’d stick with requiring your link page to match the {HTTP_REFERER} variable (you should use the entire URL but I don’t know your domain nor whether you require www, require non-www or accept both THEN use both start and end anchors).

# IN THE protected_folder directory!!!
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^http://(www\\.)example\\.com/path/to/page-with-links.html$
RewriteRule .? - [F]
# Fail any attempt to access a file in the protected_folder unless it came from page-with-links.html

Regards,

DK

Thanks David. That works nicely. This particular site is at www.mysite.com.

There was a slight issue, in that some of the PDF files didn’t want to load from the proper links page after I used that code, but I have no idea is it was the htaccess code causing the problem or not. Switching back to Pullo’s code, they loaded instantly again. Could just be a server/internet glitch, though. Not sure. I’ll test it again later.

Ralph,

:tup:

Regards,

DK

I found this thread very helpful and joined the forum just to say that.

David’s code works for me except that I’ve used the following RewriteCond

RewriteCond %{HTTP_REFERER} !^http://(www\\.)?example\\.com/path/to/page-with-links.html$

Hi simon.m,

Any relation to ralph.m? :smiley:

Thanks for joining up to share that.
I hope you find SitePoint forums a nice place and decide to stick around.

No, sorry, not in the same league, but on the same continent :cool:.

So what happens when someone figures out that they can just stuff the right referrer in and get at the goods?