Safety from bad urls

Hello,
I want to give a feedback form on my website. How can i save my database from bad urls. I don’t want to block these features as in some cases it is necessary to post these.

Thanks
Shail

There is a discussion similar to your question in the Security forum. Here is the link.

If you’re looking to block only certain URLs, that will require a list of bad URLs (which will be eternally evolving) in addition to what is outlined, above.

I didnt found anything helpful here. Is it possible for a bad guy to attack database through urls?

Yes… that’s included in the part SANITIZE USER INPUT. You have to remove the SQL to render the SQL-injection (via URL parameters or form POST/GET) ineffective.

suppose someone has posted http:/www.google.com. how should i check it?

That depends. If you are using a database to store “bad” URLs, then you would have to parse the entered text, find anything that resembles a URL (with or without ‘http’ or ‘https’), run that value against a SELECT query from the database, and compare.

If you are using a flat file (like a .txt file), then put each “bad” URL on a separate line, read the file and convert it to an array, then iterate through all the array elements and compare.

There are other ways to do it, but these are just two off the top of my head.

Y’know… this isn’t really a good idea… considering the fact that there are unethical people who will constantly create new domains for nefarious purposes, you’d never be able to keep the list up to date.

You’re best bet is to just strip out ALL html and script code from user input. Allow nothing.

OR, conversely, provide a field for URLs, and if the field has content, mark the data as “under moderation” so that the admin (you?) can approve or delete them on a case-by-case basis.

Ok suppose, some one has pasted a link for his profile like: https://www.facebook.com/xxxxx?fref=nf. If i remove / at time of sanitazing then how his friends will reach to his profile when they click it

Not trying to sound contrite, but: Your original post stated that you wanted to provide a feedback form for your site. Why would you allow people to post their Facebook profile to a feedback form?

As I stated, earlier, if you want to allow people to do such things, you can always put another field in the form for a URL (and another field for the text that will be the link) so that you can vet the link to make sure it isn’t something malicious before approving it. If that’s an aspect of the feedback form that you really want to include. But to just allow any old HTML/JS markup puts your site (and your users) at great risk. Plus, even if you were to create a script that would “root out and eliminate” malicious URLs, you’ll be spending a LOT of time updating the bad URL list - more than it’s worth doing, IMHO.

I will be using this for a feedback from only… Facebook was just an example to go deep to understand the fact. :slight_smile:

Fair enough. I still think that it’s best to strip out all HTML/JS from the feedback form. If someone absolutely, positively MUST provide a link to something, provide separate form fields for that and queue the feedback to moderation for you to approve.

I have similar problem but i guess i need to move my forum to another forum template because i am currently using smf and am afraid that it might be the cause of having too much non unique links that huts my seo

The easiest way to validate that a URL is real is to:

  1. use a URL parsing library to make sure the URL is valid. Here’s an example of how to do this in nodeJS:
    
    var url = require('url'),
      inputUrl = 'http://www.gooddomain.com/',
      parsed = url.parse(inputUrl);
    if (parsed.protocol && parsed.host && parsed.path) {
      //we have a good url here, save it
    }
  1. and make a HTTP request and check for a 2xx status code. This can be done from any server-side language before you insert the url string into a database. Another node example:

    var goog = 'http://www.google.com/',
      request = require('request');
     request(goog, function (err, response) {
      if (response.statusCode === 200) { //we have a good request, save the url 
      }
    }

However, doing both of these (especially calling a url and parsing the response) during the usual request/response cycle can kill performance so this might be best left as a background job or something.

edit: posting code on discourse is terrible