Automated Blog Comment Spam?

Via SimonMT Plus Comment Spam Equals Dead Site. The subject of blog comment spam bothers me, not so much as a problem in itself but because there’s alot of people talking about it (and suffering from it) while, at the same time, little real technical analysis.

Have to say I don’t have first hand experience of dealing with blog comment spam (and Sitepoint administer these blogs) so perhaps I’m the wrong person to make suggestions, but going to do so anyway, from the standpoint of someone who knows the technologies involved. Shoot me if I’m wrong – preferably with technical reasons.

First have yet to fully answer for myself whether the problem is primarily human beings, manually posting spam, or automated processes (scripts)? I assume the answer is both but the bigger problem is the latter, given the volumes being generated in some places, to the point of denial of service against Movable Type.

In the former case; armies of sad gnomes paid to post links for pagerank, the only decent technical solution would seem to be “blacklisting” – maintaining lists of patterns (urls / words) which should be blocked from posts.

From a quick scan of what people are doing so far, no one seems yet (correct me if I’m wrong) to have established some kind of live blacklisting service, which is open to all to read and updateable (automated) by trusted bloggers. There’s already plenty of experience with XML-RPC based services in the blogosphere so it shouldn’t be a giant leap. It should be possible to build a service where an attempted comment spam on a single blog results in a blacklisting which propogates almost immediately to all other blogs subscribed to the service. Under that kind of scheme it may be possible to age out old entries which spammers have lost interest in (reducing processing overhead in searching huge lists).

In the latter case; automated spam via scripts, strikes me there’s room to make life very difficult for spam script developers (to the point of it not being worth the effort) by considering the nature of the scripts themselves and what’s involved to write them.

The most likely tools, from where I stand, for writing spam scripts are Perl plus LWP::UserAgent (or similar like LWP::Simple), PHP plus PEAR::HTTP_Request (or possibly Snoopy) and Python plus httplib. Perhaps Intenet Explorer via COM is being used?

To be able to write a spam script using these tools requires at least some knowledge of programming and the HTTP protocol plus time to write it. Sure you don’t have to be a genius and a simple script doesn’t take long to write but still it requires a little more talent and effort than “Hello World”.

I’d be reasonably willing to hazard a bet that the number of people actually using spam scripts is much higher than those writing them (what skilled developer wants to waste much time on this?). In other words someone writes the script then distributes it to a group who lack the skill to make significant modifications to it.

It’s also worth noting that spammers are focusing on blogs running apps like Moveable Type, which offer a standard HTTP API for posting comments. What that suggests to me is the spamming scripts are primitive, probably containing hard coded form field names and perhaps hard coded (relative) URLs to POST to. In other words varying the URLs / form fields on the server will break the scripts.

So number one would be for blog app vendors to make the comment API unique to a given installation of their application (e.g. generate in setup process).

Also, giving the server-side the ability to vary the form API on a per-request basis would present a moving target. For a browser which fetched a fresh copy of the comment form, this should be no problem but the basic script now needs modification.

The implementation could be as simple having a list of different comment field sets, each set with a unique identifier (even sent to the browser hidden form field) and make the list individual to the installation of a blogging app. Each time the form is displayed to a browser, the names of the form fields are different, selected by the server from the list. When the form is submitted, the unique identifier tells the server what form field names to expect.

The spam script will now needs to start parsing the the web page to extract the field names, increasing it’s complexity by an order of magnitude.

And to catch out scripts which are parsing the page, some random dummy form fields, visually hidden to a browser using CSS, could be used to identify and block the scripts.

The script now has to parse both HTML, CSS then work out how the CSS relates to the HTML – not something you can do with 5 minutes hacking.

There’s more that can be done by exploiting capilities that a browser has but a script hasn’t, perhaps the first place to look being Javascript. There basically isn’t a scripting language capable of fully interpreting Javascript and providing all the native Javascript objects a browser has. For starters, setting a cookie with Javascript, which the server will require before allowing a POST requires the script to both extract this information from Javascript and send the correct cookie header (more complexity required). And at the extreme end of the scale, XMLHttpRequest could be used to fetch some further critical pieces of information, to be allowed to post a comment, once the page has already loaded.

Thats just some specific ideas. What seems to be the situation right now is we’re looking for unbreakable solutions. Unfortunately come up with something which is both unbreakable and user-friendly is unlikely to happen.

Seems to me the easier way path is simply to get into a development arms race with spammers, which will be “invisible” to a normal visitor with a browser. Take it to the point where so much development time and skill is required to write a spamming tool that it’s no longer worth the effort. If someone does manage to write a spamming tool for your blog, at least you’ll know they were one of the core Mozilla development team.

Anyway – that’s the view as I see it from afar. Say the word if it’s wrong.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • http://www.myriadintellect.com LetterJ

    There’s also the approach of taking what’s worked fairly well in email spam: bayesian filters.

    There’s a PHP implementation of the basic Bayesian algorithm. The original is in French and I’ve done a translation to English at:

    http://www.phpgeek.com/pragmacms/index.php?layout=main&cslot_1=14

    If the ability to train the filter was added to most of these commenting systems, you could prevent it from being posted regardless of the source or posting method.

  • ronanmagee

    What about deploying the alerady well used obscure graphic that contains a word/number that covered by a messed up graphic. This provides the added functionality and and security?? Has this method been exploited yet?

  • http://simon.incutio.com/ Skunk

    I’ve been publishing my own blacklist since I started blacklisting, the idea being that people can pull it from me once a day and use it as part of their blacklisting efforts. My original suggestion was that people subscribe to the blacklists of others that they trust – that way, if someone starts blacklisting sites that you don’t think should be blacklisted you just unsubscribe from their own list. I don’t know if anyone ever ran with this idea though.

    I think you’re seriously under-estimating both the technical abilities and the motivation of comment spammers. My personal blog is a hand-rolled system with its own API, yet it still gets comment spam from people spamming by hand. The economic incentive to spam is extremely high.

    I’ve eliminated all Google page rank from links embedded in comments on my sites, which I think has discouraged manual spammers (at least the ones that bother to read the notice). Google adapted the same technique for Blogger – I don’t know how successful it has been for them though but I haven’t heard many complaints about comment spam on Blogger sites.

    Personally I still think that eliminating PageRank is the best solution simply because it battles the economics of comment spam. As e-mail spam has shown, as long as there’s an economic incentive spammers will take more and more advanced steps to avoid filters and counter-measures.

  • php_man

    Adding loops for spammers to go through is unlikely to help if you use popular software. Whatever you do, whether it be using a token, some extra request to the server etc or set a cookie can be replicated by a script.

    The only way to prevent automatic bots is either requring registration (and validation of registration) or, as ronanmagee suggests – some type of turing test.

  • ronanmagee

    As php_man suggests registration is another ay to stop the spam, incorporated with the turing test.

    But many users are unwilling to fillin registration details for just a blog. How about requiring an email address associated with the post. Once a post has been made the blog must be confirmed by the owner of the email address.

    This means:
    1) the spammer, if they fake an email address won’t/shouldn’t have their post verified.
    2) if it turns out to be spam you have an address that you can block/submit to a black list

    For members that can be trusted you could have a flag in a database to say that they are trusted and after 10 posts dont require the email verification.

    This incorporated with the turing test might prove successful?

  • Dangermouse

    “How about requiring an email address associated with the post. Once a post has been made the blog must be confirmed by the owner of the email address”

    I’d rather enter a few characters from an image on a form im already filling in than entering my email address, check my email and clicking a link.

  • http://jystewart.net jystewart

    The ‘code hidden in image’ option is available in MoveableType through this plugin. There are, however, legitimate concerns about the accessibility of this method since it excludes the partially-sighted.

    The boolean filtering approach can be good for blocking spams once posted but is unlikely to help when the volume of spam approaches DOS levels. It’ll help keep the site clean, but not reduce the server load.

    At this point, registration is probably the best way to fight spam over the short term (and typekey is good in the sense that people aren’t registering for just one blog), perhaps even dynamically generating form field names that are associated with the session.

  • ronanmagee

    “‘d rather enter a few characters from an image on a form im already filling in than entering my email address, check my email and clicking a link.”

    I was thinking of using both methods … perfering associating an email address with a post over registering on a site and having another set of login details to remember.

    Using both methods help to eliminate the problems noted above … i.e. scritps populating the blog and personal spammers adding a post anonymously.

  • http://jdk.phpkid.org jdk

    I like Peter Bowyer’s simple approach [ http://peter.mapledesign.co.uk/weblog/archives/basic_comment_spam_protection.html ]which should eliminate most of the automated comment spam.

    JD

  • Jon B

    A main problem with any technique used to counteract blog spam is that if/when it becomes successful and well used the coders of the spamming scripts will quickly jump on it and beat it. The best way to beat the spammers is to think hard, come up with a unique method for testing or authorising posts and then not tell anyone how it works and generally keep a low profile – that will give you the most time before someone beats it, however it’s not necessarily that desirable. Jesse Warden came up with a cool flash form that posts to MT and has a built in turing test that requires no human interaction. I haven’t quite figured it out exactly, but it could be a good solution since flash is harder to parse than html and is very widely spread.

  • http://www.hostetler-family.net/mike/ escape164

    I’ve played with MT, WordPress and Drupal and have fought spam with each. Recently, Jeremy over at KernalTrap.org offered up his solution in the form of a Drupal plugin. I wouldn’t duplicate his article here, so here’s the link:

    http://drupal.org/node/14193

  • http://www.lastcraft.com/ lastcraft

    Hi.

    Looks like everybody is having this problem right now, so hopefully solutions will start to appear. The PHPLondon wiki has been brought down several times because spam reach DOS levels. The only effective way to deal with that is an .htaccess deny directive against the IP. Anything else uses too much processing and the ISP will drop your site rather than fix it themselves.

    There is no doubt that blogs and wikis are insufficiently spam hardened at the moment. You need at least some form of email check before sending the message live.

    Also most of the publishing tools force you to go to an admin page to remove junk. This is too much work. A filter that automatically blocked IPs without intervention.

    I am also about to publish a blacklist as a blog section.that way it can be picked up by RSS. Hopefully someone will come up with plug-ins to cross read trusted blacklists. Most of this spam seems to come from North American interests (gambling these days) via Chinese servers. If enough IPs get blocked then hopefully these ISPs will start to vet applicants.

    Blacklisting is harsh on ISPs, so an automated abuse system that allowed the ISPs to take immediate action would win friends for that community. I doubt this will happen until they feel the pain of blacklisting though.

    Anyway, you have blogged a call to arms just before I could. We need community blacklist tools, identity tracking and abuse tools (email confirmation) and robot hardened publishing tools. We need them fast.

    yours, Marcus

  • php_man

    Thought more about this, and my all encompassing approach would be:

    a) Offer registration and email validation, once regsitered no other checks (almost)

    b) If not registered, do turing test. We solve problem of those who are partially sighted by offering registration. You can also have a sound based turing test pretty easily (just have the letters read out in an mp3 file).

    c) For extra protection from human spammers, you have a baysenian filter on any comments that have validated above.

    d) Dos protection should not be on the script level, it should happen before then. A PHP scrpt that trys to detect a DOS and deal with that is unlikely to ever work.

    Stopping comment spam does not seem like a major technological hurdle to me at all.

  • Ren

    I’ve been toying with some code that does ‘enter the code’ type captcha.

    But instead of using images, it just uses regular html+css to display the code.

    The idea is to make obfuscate the code within the html, so it can’t be simply scraped from the page.

    It does this via various css methods, firstly the code is output in a random order, and css positioning gets it to display in the correct order, also additional random characters are intermixed, which are then prevented from displaying via several methods (offscreen positioning, z-index stacking, white text on white background, css display, css visibility, etc)

  • jrconlin

    I’ve got a pretty effective method I use for anti-spam, even with the comments sitting on the page directly beneath the post and unmoderated postings.

    My blog is wordpress, and while I do have a number of it’s anti-spam tools turned on, my additional measures have proven far more effective.

    The first thing I did was change the comment posting page. In place of the original, I left a honeypot that looks for spambots and then adds the ip address to my blacklist. This prevents folks from hacking the site since they’re flatly shut out after their first attempt. I’ve already got a rather nice list of zombie IPs.

    I’ve also added a time token to the comment form which prevents folks from scraping my pages and looking for the new post form address.

    As a added-super-special bonus, I pass various domains through a whitelist table and if your’s doesn’t match, it gets routed through the google-redirector.

    To put it simply, i’ve raised the bar so high for my crappy blog, that it just doesn’t make sense to try and spam me. You’d literally have to go to my site and hand enter it, and even then, you’d have to know how to defeat the internal spam detection and anti-DOS stuff. For a spammer/script-kiddie, just t’aint worth it.

  • http://www.realityedge.com.au mrsmiley

    Collating all the trusted blacklists into a central repository shouldn’t be difficult. Its the small matter of finding a location to run it from that has enough funding to cover the bandwidth.

    To me the first point of call if defining the infrastructure requirements on the list servers. eg. required bandwidth, etc, etc Then it can be costed out and sponsorships can obtained to cover the running costs.

  • http://www.lastcraft.com/ lastcraft

    Hi.

    Any central location will come under attack, particularily attempts to pollute it. Spamcop suffers such attacks as well as DOS attacks. RSS would provide a distributed solution.

    yours, Marcus

  • evolve

    What about deploying the alerady well used obscure graphic that contains a word/number that covered by a messed up graphic. This provides the added functionality and and security?? Has this method been exploited yet?

    I really have a problem with this method due to the fact that computers have the technology for people with disabilities to use the Internet. Unless you have a sound clip of characters within the image, then you really shouldn’t use this method.

    I find it extremely biased that extremely large websites even use this image security method, without any alternative for people with disabilities.

  • Paul Reinheimer

    While I think the blacklisting approach has some advantages from a ‘bad word’ list approach, I doubt any significant long term gains can be made with an IP based approach. Since the economic incentive is there (otherwise we wouldn’t be having this converstaion), the comment spam systems could be split into two parts, one high bandwidth system searching for blogs to spam, and a second low bandwidth dynamic-ip system actually doing the spamming. The transfer of text required to send an individual peice os spam is minimal, and someone will have finally found a use for their 3000 free hours of AOL. The prevalance of other dirt cheap ISPs can only further assist those who might use this method. Blacklisting individual IPs will do little, as a re-connect will change the IPs, blacklisting ranges will end up throwing the baby out with the bath water.

    I think the final solution may lie in a two pronged approach. Firstly using a combination of those captcha images/sounds. Yes they can be defeated, but doing so requires both effort and cpu power. Secondly by profiling the comments in question. Examine corelation between the words used in the post and the comment, as well as the number of comments made by an individual across different posts in a given timeframe. A single comment posted by an individual IP in one day, with a decent co-relation of words to the post is likely not comment spam. Meanwhile several identical comments posted to several different posts with little to no co-relation in a very short period of time likely is.

  • http://www.akatombo.com ultrabob

    Some of the new features of the new version of MT Blacklist sound like they are working quite well. Those of us who aren’t planning to upgrade though are kind of stuck.

    The idea of a centralized community run blacklist introduces the problem of over eager blacklisting. I imported a blacklist listing .biz the other day.

    I really think the most promising idea is a set of rotating form field names if each field had a random name each time which there was no way to predict, ot would make it nearly impossible for an outside script to post comments.

    Finally, to provide some possible fodder for people trying to attack the problem I have been collecting details of entities that hit my old comment script. The only entities that should be hitting this script are spambots that follow MT’s setup defaults or had cached the page location OR robots who don’t respect robots.txt and are visiting due to cached information on the page location. There are no links to the old script on the site. When something hits the old script their details are recorded and a 404 error is returned. I’ve been considering writing their IP addresses to a .htaccess reject list automatically so that even if they get wise and go to the real script they’ll still have no luck.

    Anyway, here is the list http://www.dynamicduo.info/spammers.txt

  • Anil

    We (at Six Apart) will be communicating a lot more soon about the techniques and tricks that comment spammers use, along with what we’ve done to fight their efforts. Keep an eye on our site, and we’ll have details shortly. There has definitely been a dearth of information about the technical nuances of comment spamming.

  • http://www.andrewloe.com/ WALoeIII

    I’m thinking that we should just change the order of the requirements randomly based on time. Or the random field name idea works equally well – it would make it so difficult that it would stop being worth it!

  • http://www.ajohnstone.com Andrew-J2000

    In the past 3-4 weeks, I have had roughly 800 comments, which were all advertising to online casinos and random porn sites. This was quickly eradicated by enabling some minimal filtering.

    WordPress has comment filtering and I now mediate all comments. Filtering for specific key words alleviates these annoyances from this type of spam.

  • lsmith

    How about mouse gestures? Should be possible to write a javascript that picks up the mouse gesture. Or maybe easier for the user is to click on several points in a specific order.

  • KJ

    The method of typing in a code that you see is a terrible one, not only is it not accessible but can be really poorly implemented. How far do you go to obscure the code?

    Microsoft Passport’s signup page is the worst I have seen. They have added a link for people who can’t see the image so that a sound plays, which is an interesting idea, but I can not make out the code in the image and the sound quality is awful. Overall this is the worst implementation of this method, however it’s a good educational tool teaching people the downsides.

  • http://blog.phpdoc.info/ scoates

    Computationally-expensive javascript.
    Yes, sure, it eliminates anyone without javascript — perhaps non-javascript browsers would be presented with a CAPTCHA style test.

    However, an approach similar to those recently discussed in the email world, where the client must perform some “computationally-expensive” task before posting. Considering that it normally takes 10+ seconds to write and post a comment, why not have a javascript function that performs this task while the user is typing? This task would have to be completable in ~10 seconds on a mid-powered machine (say PIII 700MHz). The form would then be keyed off of the results of this calculation (kind of like slashcode’s formkeys).

    Non-intrusive to (most) users, and would discourage high-volume automated spam attacks.

    This does not, however, reduce “manual” spam.

    S

  • http://blog.casey-sweat.us/ sweatje

    Using WordPress, I added a one line inlcude to wp-comments-post.php to this file:


    < ?php
    if ( 'sunny@moonlightshadow.us' == $email
    || preg_match('/poker|casino|loans|gambling/i', $url)
    || preg_match('/phentermine|consolidation|play-texas/i', $url)
    || preg_match('/debt consolidation/i', $author)
    || preg_match('/viagra/i', $comment)
    || preg_match('/ownthis.com/i', $url)
    || preg_match('/texas-hold/i', $url)
    )
    die( __('I do not accept comments from you. Buzz off.') );

    I have the default comment moderation stuff turned on. Each time I get annoyed by someone, I throw the url they are posting or their garbage content into my "kill file". Seems to keep the worst offenders out.

    HTH

  • seratonin

    Marcus, I like your idea of a distributed blacklist solution using RSS/RDF. A central solution would be a single point of failure.

    JT

  • synace

    Since everyone’s main concern is ‘popular’ software, (including the spammer-script author who wouldn’t want to write a script for single-use) why not just setup a internet-wide registration, kind of like ‘yahoo groups’ or that one message board (cant remember it’s name) did. You register ONCE for the ‘blogging system’, then, when you come across a blog that utilizes this system, you’re cookied in, or you have to login/pass with your system-wide login/pass. This could be done on 3rd party (vendor) SSL popup w/ api’s back&forth if password security is a concern. Basically, then, if someone is reported enough times for spamming, they get added to the blacklist (any form of blacklist will do, to be debated later) and then their posts can be ‘hidden’, and they can be prevented from posting AT THE DISCRETION of the blog maintainer. how’s that for power to the people? You could even setup thresholds (# or reports from ACTUAL blog maintainers), and rate the blog maintainers by some other metric so as to prevent a malicious user from signing up for a blog & trying to blacklists everyone who posts on their blog.

    anyway… a game of monopoly anyone? ;)

    ~synace~
    http://www.synace.com

  • yosoyminero

    When serving a link to post comments… what about having a master file out of the httpd root and copying it to a predefined path (preferibly out of the usual location for the particular software used) with a very long and difficult to remember name?

    For example, the name could be generated md5′ing a randomly generated text pattern.

    The pattern could be changed daily, hourly, etc…

  • http://www.realityedge.com.au mrsmiley

    Distributing blacklists still has the problem that is is subject to junk being added to it, but this time from an infinite number of locations rather than at a single source.

    Ideally, borrowing some concepts from BIND, you could setup a distributed black network for any arbitrary list (eg. email spam, blog spam, bad servers, bad ip’s, bad words, etc) that works between trusted servers. If there were enough trusted servers, people download a copy of the list from a local server. Then it would just be a matter of resyncing the lists between the servers at some defined interval.

    The only other solution that has been mentioned here that has any real benefit is pre registration of users. If you guys really want annonymous posts, then no matter what you do, your asking for trouble. That’s why mail relays and annoymous ftp servers are bad now, because the whole point of them is you have no idea who is giving you the data, good or bad.

    Pre-registration and authentication seems to me that it would cut out something like 90% of any spam you are likely to get providing you make the registration process difficult enough that scripts like snoopy cant just create users on the fly to circumvent it.

  • http://www.lastcraft.com/ lastcraft

    Hi.

    There is a third party involved here that could do a lot to help. If we had a simple way of reporting the spam links to Google then the incentive could be destroyed at source. Google could drop any spam promoted website. The post coould be double vetted as well. Your blog/wiki filters first and automatically forwards the offending post to Google. Google then passes it though their own filters, perhaps banning sites on multiple hits.

    As it’s the Google page rank system that is shafting site owners right now, you could say Google has a responsibility to do this.

    yours, Marcus

  • seratonin

    What about syndicating the black list so that other blogs which “subscribe” to your blog are automatically “notified” when someone is attempting a spam? The blacklist could be part of the standard RSS or Atom syndication stream. Distributing the blacklist seems to be beneficial over having a central server blacklist.

    JT

  • DerekMartin.ca

    Why not go 1 step further and make it super-easy for users? What I’ve been planning is, on page view have a script choose a random word from the page. That word is the required value for the additional field. It’s never the same word, and it’s never in the same position on the page. Then, instead of having the user type it in, simply ask them to click a button (which uses javascript to prefill the correct value into the hidden required field) before they can submit. The new process would be 1) Type comment; 2) Press “PreventSpam” button; 3) Submit.

    My $0.02

  • http://3dwargamer.net wwb_99

    I think the following is the best, most workable combination:

    1) Setup registration system.
    2) Anonomous must pass turning test. Or they can register, thereby covering those with sight issues.
    3) Setup your blogware to not convert URLs into hyperlinks. This would take away the impetus to spam blogs in the first place, as google will just read past the text. Persons are capable of copying and pasting links.

  • Ren

    Is there a way to redirect urls but prevent googlebot/spiders following, via a meta refresh or something? (I know its possible by using JS, but that’s unusable imo.)

  • Tor Bjornrud

    A plugin for WordPress, Spam Karma, uses a lot of different techniques to very good effect. I’ve been using it for some time now and have had only one spam comment get through. All variant comments of that spam have since been blocked automatically.

    http://unknowngenius.com/blog/archives/2004/11/19/spam-karma-merciless-spam-killing-machine/

  • http://mgaps.highsidecafe.com BDKR

    Man it’s good to hear some of this stuff. After I realized that a turing test didn’t stop the comment spam on my site, I got a little dejected and ignored it. But that’s got to change. I just started to dig into it last night. Good post and great comments.

  • http://www.dvd-software.info hurricane_sh

    I have a MT blog and struggled with the spamming for a while, here is my experience:

    1. Because MT generates static html pages, deleting spams involves blog rebuilding and the process is very slow, 100 spams everyday will keep you busy with deleting.

    2. Use add-on tool MT Blacklist, it’s a little tedious to install, but works great. The most important usage is adding block string, when you see a spam not blocked directly(past or waiting for approval), find a word in it which should not be used by normal visitors and add it to your blocked string list. Now, I only see one or two spams every week.

  • http://mgaps.highsidecafe.com BDKR

    Just took a look at Snoopy. It’s a pretty capable and sophisticated peice of work. Rather suprising when you get right down to it. Fighting this stuff really looks like an uphill battle.

  • php_man

    I am sort of lost as to why people think comment spam is such a problem. Combine registration option (for those with disabilities) and an image/sound touring test and you have everything covered. What exactly is the problem?

  • Andy B

    I think some form of black-list / white-list system is probably the best in the long run; the former can be circulated widely to others.

    Andy from http://www.lakedistrictdesktops.com

  • http://www.igeek.info asp_funda

    From a quick scan of what people are doing so far, no one seems yet (correct me if I’m wrong) to have established some kind of live blacklisting service, which is open to all to read and updateable (automated) by trusted bloggers.

    Well, you are indeed wrong here Harry. Mark Gosh has a project named CSPAM running which will do precisely what you said about live blacklisting etc. & he announced it on the weblogtoolscollection.com in October.
    Though I’d say that centralised blacklisting can be abused pretty heavily(not everyone’s a saint) & these sort of things should be avoided as far as possible.

    Also as some suggested, I’m one of the crowd who are not in favour of captcha authetication. Though its effective, the downfalls are there. The code tends to expire after some minutes. Like some people, I doo take a bit of time in composing & posting comments on thoughtful & lengthy posts. So when I submit my comments with the correct code(that I see), I get a message saying that the code has expired & I need to key in the new code. That really puts me off & if I hadn’t copied my message before posting, I’ll have to recompose it again. The trouble’s not worth it when you are just posting some remarks on some post that you came across while surfing. Also this’ll fail if the post is too long. The user will take some time to read it & by the time he makes a quick comment, the code has expired. That’s just ridiculous.

    Then you talk about using JavaScript to do some validation while the user is composing the comment, maybe using XMLHttp. I’ll just say that if you can use a tool, it doesn’t necessarily means that you should use it. ;) Many users turn off JavaScript while surfing, remember?

    Also, the whole point in fighting SPAM is that the SPAMMERS should face the heat, not the genuine users. With captcha, JS authentication, etc., the users will face the heat. For them the system should be as transparent as possible. Otherwise we can have that old remedy, email authentication. An eMail can be dispatched to the user so that he can approve his comment. No, that’s not feasible, you’ll be lucky to have a comment on your blog.

    That’s why I like the approach that 2 of the hot comment spam plugins for WordPress take. They are Spam Karma & Spaminator. Spam Karma essentially runs a comment through an array of filters & either approves the comment, or discards it or place it in moderation queue so that either the admin approves it or the user can approve it by using either captcha or email approval.

    The Spaminator is a combination of two cool plugins, the Spammer TarPit & Three Strikes Plugin.

    From what I’ve read from the response of users on the WordPress Hackers list, both Spam Karma & Spaminator have been pretty effective in out-doing comment spam.

    Needless to say, I like the approach these two plugins take for fighting spam, keeping the system all too transparent to the users so that they don’t have to bear the grunt. Now if both these plugins mate, the result’ll be fabulous. ;)

  • drumdance

    My semi-low-tech and easily crackable approach has been to rename the the script that processes comments. Eventually spambots will be read the comment form action and I’ll look for a better solution, but since I did this all spam stopped.

  • Anonymous

    I did not read all of the comments so excuse me if this has been said.

    What about a blacklist webservices. The service would allow you to register to be part of it and then if you did not want a domain to be blacklisted for one reason or another, the webservice would keep track of this specific to your user information.

  • http://www.thecatsite.com Anat

    My two WordPress blogs have been hit with spam. I think it’s script generated. It’s promoting some dubious poker site.
    Once I added the right words to the spam filter (built in WordPress) the problem was solved.

  • http://www.dustindiaz.com polvero

    just jumped on here from the newsletter.
    Oh my goodness does comment spam bother me so much.

    I recently wrote this small rant on Being infested with Spam.

    I, probably like many others, have turned on the moderating of comments as a temporary solution.

    But really, no blogger really wants to do that. It takes the fun out of having a beautiful commenting community.

    MT’s filters aren’t good enough.
    They only allow blocking of IP’s. But even if it allowed us to block by email or user, even those are getting random.

    How does someone send out a random Referer IP on every visit?

    Anyway, when someone comes up with a viable solution, I’ll be all over it. I’m very angry with these stupid spammers.

    Who knows, it could be just one guy sending out hundreds of thousands every hour.

    We’ll catch the criminal eventually.

  • Anonymous

    Blacklisting is not such a hot idea. you have a wide range of people, including those who think they are computer savvy, but are easily duped. E-mails can be faked, e-mail forms from sites can be abused to send spam via internet because not everyone knows how to create mail – forms with decent security. Programs can be sent from local libraries.

    So the blacklist would need heavy moderation, especially since someone could just list a site for pure spite.

    Maybe it would be wiser to just start a group effort of hunting down spammers and making examples out of them and their sites, but follow some kind of guidelines of doing so.

    If people can band together to work on opensource projects and produce some decent stuff along the way, I am sure that people can pull sources against spammers and just hunt them down one at a time.

    Its not like you can wait for microsoft or the government to do something about it. Microsoft is too busy charging high prices for their monopoly, while the government is too busy helping the RIAA hunting down college kids downloading music, because the RIAA was too stupid to sell mp3s for cheap in the first place.

    Its our internet, lets take it back! unless you guys actually like having a mailbox full of spam or porn all over your blog so that cyber nanny blocks you, or exposing innocent people coming to see your thoughts, ideas, or opinions to see someone trying to sell sugar pills as viagra on your site.

  • Justin

    I’ve never seen any of these scripts, so I’m not sure how they work, but what if you had randomly generated input names that had to be varified with the database before posting? These names could be changed to whatever the site owner wanted and also whenever he/she wanted. That’s just an idea… I haven’t really thought it through completely. I guess it depends on how the scripts work.

  • Diana C.

    I want to share a web log spam-fighting success story.

    A poster spammed on my web log with the user name “diet-pills”. They posted a ridiculous nonsensical post with several links to their discount pill supplier web site. I took a few seconds to look at the URLs posted, and do a whois on the URL. After I found out who had registered their URL – some sort of privacy-registrar specializing in keeping the registrant’s contact info secret – I looked into their spam policy, and found that they were very strictly anti-spam oriented. I sent a detailed email to the domain registration company telling them about the spam and linking the location, and CC’d the email to the diet-pills web site.

    Within 24 hours, I got a response from a wholesale pill supplier, who explained that they received copies of the diet-pills web site’s emailed feedback, and they apologized for the spam, and told me that they were immediately discontinuing their wholesale relationship with the diet-pills web site because they have a strict anti-spam policy.

    I never heard back from the registrar or the diet-pills people, and I didn’t bother to follow up on whether the registrar took action. I was fairly satisfied with one of their main suppliers dropping them.

    15 minutes can make a bit of an impact!

  • c. s.

    Your thoughts are in the right direction (transparent for legitimate users, hell for spammers), but you’re under the misguided impression that only web browsers can understand JavaScript. There are open source JS interpreters out there for the spam software writers. It’s just a matter of time before they realize this. You’d be safe today, but vulnerable tomorrow.

    “The best way to beat the spammers is to think hard, come up with a unique method for testing or authorising posts and then not tell anyone how it works and generally keep a low profile – that will give you the most time before someone beats it, however it’s not necessarily that desirable.”

    What would be the point of designing SuperSpamStopper 5000 if you were the only one benefiting from it? This is a widespread problem, and keeping the best solutions to yourself is near useless in the fight against spam.

  • IO ERROR

    I’ve been working on the comment spam problem on my WordPress site for a short while now, since being hit with 764 of them over New Year’s Eve. In short order I whipped up a SpamAssassin plugin which filters every comment, trackback and ping through SpamAssassin. Since that time only two spams have gotten through, and no false positives. The users don’t even know it’s there, except that I told them in big 24-point letters. :)

  • DVS007

    This may sound stupid or it may sound profound. Ive noticed when submitting my URLs to directories and whatnot that they have made me view a image with letters on it and enter those letters into a text field before hitting the submit button. Why dont blogs use this? Or is this too easy. Seems to me a script wouldnt have image recognition capabilities….

  • j-man

    I prefer to fight comment spam by using a regular expression to disallow any post that contains a full URL including http|s included legit users can still post a simplified “www” version of the url. It works quite well in simple applications if you dont require that your users have the ability to post a linked url.

  • vardeldrone

    delbocdarbo

  • http://www.casualdate.net.au bird lavonne

    Never heard back from the registrar or the diet-pills people, and I didn’t bother to follow up on whether the registrar took action. I was fairly satisfied with one of their main suppliers dropping them.

  • biometric01

    Much has been discussed about Identity Theft, user ID’s and Passwords stolen or hacked, credit cards being used without the owners knowledge and so on. Now there is a safe way of protecting your passwords and identity online from being copied, stolen and hacked by keyboard trojans, using your biometric fingerprint and face recognition, and even voice, to log on to web sites. By simply scanning your finger or face or voice you can log on to a web site, log on to your computer, and even encrypt files and folders. No more worrying about who might hack into your online accounts or even your email. No more remembering passwords or using the same passwords on many sites. This is an exciting new innovation from myBiodentity and they have about fourteen products that are enabled with biometrics including email encryption, password manager, virtual disk, and many more. You can read more at