10 Things to Check Before Using a CAPTCHA

    Craig Buckler

    Spam botAll CAPTCHA systems are doomed to fail. Unfortunately, this has not prevented eager developers using CAPTCHAs in even the most basic web to email forms.

    No one likes CAPTCHAs. They are not fun. They can not be used by everyone, such as those with impaired vision or without graphics enabled. They slow down the sign-up process and, ultimately, they will lead to fewer real registrations.

    The worst problem with CAPTCHAs is that they put the onus on the user. Users do not care if you are receiving thousands of spam messages or bogus accounts: that’s your problem. CAPTCHAs should be the last barrier of defence – not the first.

    The vast majority of hacking attempts and bots can be prevented without resorting to CAPTCHAs. If you make it moderately difficult, spammers will simply move on to the next easier target. Here are some basic techniques that will stop the majority of spoofing attempts.

    1. Validate everything server-side

    You need to validate every field using server-side code – even if you have strong client-side validation. Be especially careful with fields that are placed in email headers. Email addresses are probably the most important values to check: use a good regular expression and watch out for HTML tags, SQL injections, or return characters (n and r in PHP).

    2. Check for spam-like content

    Most spammers post links to websites. If that’s not something you are expecting, it could indicate a spam bot. A third-party tool such as Akismet could help.

    3. Check for rogue POST and GET values

    If your form expects three POSTed fields, the existence of a fourth could indicate a hacking attempt. Similarly, check that no additional GET values have been passed.

    4. Check the HTTP header

    Simpler spam bots will rarely set a user agent (HTTP_USER_AGENT) or a referring page (HTTP_REFERER). You should certainly ensure the referrer is the page where your form is located.

    5. Use a honeypot field

    Spambots normally attempt to complete every form field so they pass basic validation. A honeypot field is one that is hidden from the user (CSS display set to none), so any value passed back is likely to come from a bot. The field should be labelled “Please leave this blank” or similar to account for those with CSS disabled or using custom stylesheets.

    6. Detect the presence of JavaScript

    If your page can run JavaScript, you can be almost certain it has been loaded in a browser by a human user. A simple in-page dynamically generated JavaScript function could perform a simple calculation or create a checksum for the posted data. This can be passed back in a form value for verification.

    An estimated 10% of people have JavaScript disabled, so further checks will be necessary in those situations.

    7. Show a verification page or fail the first posting attempt

    Bots have a tough time reacting to a server response. If you are in any doubt about the validity of a post, show a intermediary page asking the user to confirm their data and press submit again.

    8. Time the user response

    Accounting for human behaviour is one of the best ways to spot the bots. Users will take a little time to complete forms whereas bots are almost instantaneous. I use the following method in many forms and it has been effective:

    1. The current server time is recorded when the form page is generated.
    2. The time value is encoded into a string. The actual encoding algorithm is up to you, but it must be one that is not obvious and allow decoding back to the original value. I would also recommend using unique user data, such as the IP address, as an encryption key.
    3. The encoded time is put in a hidden form value.
    4. When the form is posted back, the field is checked and decoded back to a time. This can now be compared with the current server time to ensure the response time falls within a specific window, e.g. between 20 seconds and 20 minutes.

    There are several benefits to this process: it does not rely on client-side technology, the time value must be in the returned data and, even if your form is spoofed, it limits the number of bogus submissions that can be sent.

    9. Log everything

    Keep a log of everything that occurs during a form submission process. This need not be an elegant solution; writing to a file will be adequate. The information you gather will be invaluable when spotting hacking attempts and implementing solutions.

    10. Handling the extreme cases

    Some of the techniques above will fail for legitimate users, e.g. checking for JavaScript or the HTTP header. It is only likely to affect a small number of users so a CAPTCHA could be used in those circumstances.

    Alternatively, if there is any doubt about the data validity for a small number of users, you could add human verification to your process. Ensure it simple to operate, i.e. email an administrator and only accept the post once a reply is received.

    CAPTCHAs can be essential for sites that could incur significant monetary loss or are obvious targets for illegal activities, such as online banking and webmail. However, they are overkill for most forms: a combination of techniques will stop the majority of bots without making sign-ups difficult for real users.