SitePoint Sponsor

User Tag List

Results 1 to 6 of 6
  1. #1
    SitePoint Enthusiast
    Join Date
    May 2007
    Posts
    40
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Question PHP security issue: how to sanitize URL input by user

    I'm using PHP (with a MySQL db) to build a little gift registry system for the impending new addition to our family. Although this is a purely personal/fun project (that will all be in a password-protected directory only accessed by our friends and family), I am using the experience to try to improve my skills and knowledge with PHP security issues.

    The site has an admin section where new items can be added to the gift registry and existing items can be edited. Among other things, the add (or edit) forms contain an input for a URL and an input for the display text for the URL. The URL and the display text get stored in the db and then later output on the registry page where people can browse to see what gifts we would like. Then they can click on the link to actually see the product and possibly order it.

    So, pretend for a moment that this isn't my little password-protected personal project, but instead it's on a site with a mass audience. Is there any good way to "sanitize" a URL that was input by a user? How on earth would you make sure that someone isn't going to enter a link to some offensive website or some malicious script and then give it an innocuous name like "Amazon" or "Target", giving people a big surprise when they click on it!

    Just curious if anyone has any thoughts on this... like I said, I'm just learning about a lot of the security issues with PHP and trying to think them through on a "safe" project before I am someday faced with doing it for real! Thanks!

  2. #2
    Follow Me On Twitter: @djg gold trophysilver trophybronze trophy Dan Grossman's Avatar
    Join Date
    Aug 2000
    Location
    Philadephia, PA
    Posts
    20,578
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    You can make sure the input is a valid URL and only a valid URL, but that's as far as you can go programmatically. No amount of website code can tell if what's on the other end of the URL is porn or exploits some security vulnerability to install a virus.

    It's not feasible to follow arbitrarily nested iframes, decode obfuscated JavaScript, etc. that would be required to do that.

    Google can mark URLs as "unsafe" by running thousands of virtual PCs, having them visiting URLs with real browsers one at a time, then checking if any new software was installed on the virtual machine's drive... followed by shutting down and relaunching the virtual PC fresh for the next site.

    No individual site owner has that kind of ability, so real sites that allow URLs have to deal with malicious users on a case-by-case basis; either by allowing users to report activity once they see it (like the report icon on each post of these forums) or by employing people to check submissions every day.

  3. #3
    SitePoint Enthusiast
    Join Date
    May 2007
    Posts
    40
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks for the reply, Dan! I wasn't sure if I was missing something because I'm new to this, or if it really is a situation where there isn't a great solution; glad to know it is the latter!

    I guess I was starting to think along the lines of using regular expressions to look at the URL for things in it that would be likely to cause problems (such as ".js" for Javascript). But I wasn't really sure what else I might look for if I took that approach... and you're right, it still wouldn't eliminate offensive sites with "normal" URLs.

  4. #4
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2008
    Posts
    5,757
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    filter_var() has a FILTER_VALIDATE_URL option, but I haven't seen any documentation as to its specifics.

    I wouldn't be too concerned about the file extension of the url. The extension is nothing more than a convention, and does not determine what type of content you will get from the url. You can have a .gif ext serve an html page if you like, although you probably won't actually encounter that in the wild aside from someone trying to defeat someone elses attempt to restrict a url file extension .

    Probably your biggest concern is to make sure you don't allow any html or script injection when you place the url into the html. If you've got some time, this can give you a pretty good idea of what you're up against in terms of xss http://ha.ckers.org/xss.html which can help you better defend against it.

  5. #5
    SitePoint Zealot adam.jimenez's Avatar
    Join Date
    May 2009
    Location
    Ware, UK
    Posts
    136
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    i like how slashdot automatically extracts the domain name and puts it next to the link e.g. "click here to go to amazon! [hackersite.com]"

  6. #6
    SitePoint Zealot
    Join Date
    Jun 2008
    Posts
    126
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    You could create a list of shopping sites and only allow links to those. That might be overly restrictive.

    Google has something called the Safe Browsing API. It looks like it might allow checking input against Google's list of known malicious or phishing sites: http://code.google.com/apis/safebrowsing/ .

    To catch submissions that are inappropriate but not malicious, it might be possible to write code that would fetch the submitted page from the remote site, look for ICRA or other content labeling, interpret it, and reject any site that isn't general-audience-suitable or isn't rated at all. Or to scan its text and reject the submission if the page contains words that are likely indicators of unsuitable content. Following on Dan Grossman's idea, you couldn't follow and analyze iframes, but you could reject (or flag for review) any page that has any iframes in it.

    It might be possible to use something like GeoIP to do some IP filtering: look up the IP address of the site being referred to, find its geographical location, and reject it if it's outside the region where legitimate links are likely to point. For example, considering shipping costs, how many people in the US are likely to be buying baby gifts from a .ru or .cn website?

    Most time consuming would be to require administrator review and approval of new submissions before they're posted to the site.

    I want to commend you for thinking about this issue in advance and doing the experiments on a safe test site. You might not be aware how incredibly rare that is.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •