Spam is no longer limited to email. If you run a Website on which you allow users to leave comments, you have undoubtedly faced the problem of comment spam.
The spammers’ aim is not to redirect some of your traffic to their site, which is the obvious initial conclusion; it is to increase their (or their clients’) ranking in search engines. Most search engines now count in a site’s ranking how many other Websites have linked to it. By leaving comments on your site, the spammers’ sites can achieve a slightly higher search engine ranking.
The spammers’ job is to get around spam-blockers and target the security of individual Websites; though occasionally they do so on a manual basis, by far the most common forms of comment spam are achieved with spam "bots" or scripts. Unfortunately, many site owners don’t focus on their Websites as their day job, which can make adapting to spam bots difficult.
Rules of Thumb
When you find that your site is the victim of comment spam, it’s easy to react strongly, on a per-case basis, rather than look at the bigger picture. These Rules of Thumb should help you keep things in perspective.
The most important of these rules is: don’t take it personally. Spammers don’t want to degrade your site. They simply want to get people to their sites and make a larger profit.
1. Don’t Ban Specific IP Addresses
Don’t bother banning IP addresses. Although this is the most logical thing to do, it rarely helps much. Most comment-spammers bounce requests off other computers and servers, so you’ll likely never be able to eradicate them from your site entirely.
As a Comment Spammer explained in this interview at The Register, "So Sam (a comment spammer), like other link spammers, uses the thousands of ‘open proxies’ on the net. These are machines which, by accident (read: clueless sysadmins) or design (read: clueless managers) are set up so that anyone, anywhere, can access another Website through them. Usually intended for internal use, so a company only needs one machine facing the net, they’re actually hard to lock down completely."
2. Don’t Allow HTML
If you feel the need to allow the user to include links, there are a number of ways by which you can code to accommodate that functionality, without making your site vulnerable to attack. The most common method is to inform the user that all URLs will be converted to links automatically, then convert any content that starts with http:// to a link.
3. Use Non-Descriptive Form Names
Good programming requires the use of descriptive names, but in avoiding comment spam, you should stay away from names that describe a form’s fields. Form element names like "Comment" make it too easy for spammers to access your comment system.
4. Use rel="nofollow" for All Links
If you allow site users to include links in their comments, add rel="nofollow" to the tag, as shown below:
<a href="http://www.sitepoint.com" rel="nofollow">SitePoint</a>
This technique allows search engines bots to ignore the link, so the spammer gains no benefit from adding links to your comments.
You can stop comment spam via two approaches. The first tackles the problem before the comment is posted; the second addresses spamming after the fact.
1. Differentiate Between Spammers and Regular Users
Differentiating between spammers and regular users involves requiring your human posters to identify themselves as such though an extra step inserted in the commenting process. This is possibly the most widely used approach to avoiding comment spam, and includes two options.
The Turing Test
The most commonly used Turing Test (named after a computer scientist Alan Turing) is called CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart).
It involves adding to your site’s comments area an image that contains a random piece of text. The text must be somewhat tarnished or blurred so that a human can read it, but a computer cannot. The commenter is asked to copy this text exactly into a form field before they submit the form.
This process makes it easy to guarantee that the commenter is a person, not a bot. It does not solve the problem of a human being spamming your comments section manually. However, as most spamming is carried out automatically by bots, this technique should stop most comment spam.
Constant updating of your code is not necessary, as this technique is not code-reliant.
Users are forced to perform an extra step, which does not benefit them, in order to submit their comments. This can be a serious downfall if your Website is just starting out and you’re trying to encourage people to comment.
More information on CAPTCHAS can be found at Capcha.net. A free resource that explains how to use CAPTCHAS on your site is available at Human Verify.
Using this method, the site owner requires all users to set up a username and password before they can comment on the site. If the administrator then finds a user spamming the site, he or she can ban that username or email address.
This approach works in two ways. Firstly, spammers don’t want to be identified and therefore will be unlikely to sign up. But, even if they do, the hurdle of having to take the time to sign up in order to spam (and be banned immediately) can be a strong deterrent. After all, there are many, easier targets online than a site that has a user authentication system in place.
Though code-driven, this solution involves a database of users and simplified user management, so it’s not too time-consuming. The banning of offending usernames may take some time, however.
Before they can post, users are forced to perform numerous extra steps, for which they may see little benefit. This can be a serious downfall if your site is just starting out and you are trying to encourage usage.
More information on creating a user authentication system can be found at Developer Fusion, and, of course, through a search here at SitePoint.com.
2. Catch Comment Spam After it has Been Added
Catching comment spam will be necessary if you decide not to differentiate between spammers and human users. It may also be necessary if you have taken the steps above â€“ some comment spam is almost inevitable.
This approach involves the creation of a check that occurs after the comment is submitted to identify it is spam or a legitimate post. Of course, you can go through posts manually before they’re made live, checking to ensure they’re not spam. But you can also automate the process: create a list of keywords that are common to spam, and check each post against this list. You can then weed out any comments that contain the offending words (which might include terms like Viagra, gambling, poker, meds, etc.).
This comparison can be done in various ways, and at a number of points during comment processing. Most programming languages make it very easy to check a string for given keywords. Make sure that the string’s case is also compared, by converting the string to lowercase or uppercase, before you run the comparison. If the comment is found to contain the key words, the spammer can be warned, and the comment deleted.
This approach does not require the commenter to take any extra steps, so the comments section remains simple and easy to use.
As the spammer changes the words used by the spam bot, your keyword list must also be updated. This technique will also be difficult to implement if the spammer advertises products that are relevant to your Website, your list of banned words might stop legitimate comments from being posted.
Managing Comment Spam
Spam will always be a problem. However, a well-designed site that has taken into consideration the common spamming techniques will be able to avoid most spam. The techniques we’ve explored here should help site owners effectively to battle comment spam.
Ultimately, the Webmaster needs to adapt his/her techniques to deal with spam on an ongoing basis. The secret to success, then, is continual monitoring and adaptation to spammer’s changing tactics.