The End is Nigh for CAPTCHAs

CAPTCHAs are DOOMEDCAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart) are the squiggly letters or words used when you complete a web form. They are used by systems to ensure you are a human rather than a bot. Whilst the tests are effective at thwarting spam, their success will never be a long term solution.

1. Accessibility
The majority of CAPTCHAs ask you to visually recognise and identify a series of symbols. It is hard enough for people with perfect sight, but it can be impossible for those with impaired vision. Some systems do provide an audio alternative, but this simply gives hackers two methods of attack.

2. It’s not a Turing test
Humans can normally spot a machine masquerading as a person – the standard definition of the Turing test. However, CAPTCHAs depend on a machine differentiating between a human and another machine. That’s a far more difficult proposition especially since Optical Character Recognition software gets better every day.

3. All CAPTCHAs can be cracked
Computers will become faster and software will become more sophisticated. It is inevitable that all CAPTCHAs will eventually be cracked.

CAPTCHA-cracking is already a lucrative hobby for many hackers. However, human effort can be just as effective: why spend thousands on complex software when the task can be outsourced to hundreds of workers in India?

4. CAPTCHAs are getting more difficult
The simple solution to cracked CAPTCHAs is to make the test more difficult. How many times have you failed a CAPTCHA test? Some have become ridiculously hard and many of the alternatives are worse, e.g.

  • the totally indecipherable cats or other animals on letters (yes, rapidshare.com, I’m referring to you!)
  • draggable objects that do not work without a mouse and can still be spoofed
  • simple questions that are even easier to hack than CAPTCHAs, e.g. “what is the total of 1 plus three?”
  • or Google’s new image rotation CAPTCHA which requires client-side coding and hackers probably have a 1 in 10 chance of randomly rotating to the correct angle.

5. CAPTCHAs measure ability
The fundamental problem with CAPTCHAs is that they measure ability: your effectiveness at interpreting a fairly unreadable set of letters. However, computers are already effective at synthesising some human abilities and will improve.

Perhaps it is better to detect human behaviour? When most people complete an online form, they scroll down the page, click boxes, add text, pause, highlight segments, delete and retype sections. Random page interaction could be a better indicator of human activity?

Despite all the problems, CAPTCHAs are often used as the first line of defence in even the most basic web forms. They should be the last.

See also: 10 Things to Check Before Using a CAPTCHA

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • http://benbalbo.com/ benbalbo

    Anyone that is using, or thinking of using CAPTCHA systems might want to take a look at reCAPTCHA for two reasons.

    Firstly, you’ll reduce the chances that the text can be read by machines, as the scribbles your visitors are interpreting are those that some machines could not identify.

    Secondly, visitors will be helping to “digitize books, newspapers and old time radio”.

    If you’re going to use CAPTCHA, you might as well help out while you’re at it ;)

    I look forward to your follow up article Craig!

  • nachenko

    The idea of detecting page scrolling and other typical human bahaviors sound promising.

  • Tarh

    Detecting human behaviour like page scrolling is probably even more futile than the simple “1 plus three” question.
     
    How are you going to tell that they’re scrolling the page and clicking boxes, etc.?
     
    JavaScript? Bots will just disable your code for detecting fraud.
    AJAX? Bots will just spoof the HTTP requests.
    Some kind of freaky CSS? Good luck coming up with something that the bots can’t just ignore.
     
    Concerning Google’s CAPTCHA idea, please read the actual whitepaper (not just Sitepoint’s summary) since the majority of questions / assertions made on Sitepoint’s blog and blog responses are directly answered / refuted in the paper itself. This “1 in 10 chance” (or “1 in 9″ as another commenter also randomly made up) is completely incorrect. Random guessing yielded a 0.009% success rate with the suggested 3-image, 16-degree variance system. It’s amazing how much ignorance there is surrounding the very paper that these postings are based on, even when Google anticipated this ignorance and addressed it directly.
     
    Also, as a strong advocate for disabling JavaScript by default (and a user of 6 Firefox addons just to accomplish this task without detection) I’d feel completely fine enabling client-side code just for these new CAPTCHAs. It sure beats trying to figure out if that inkblot is a cat ;-)
     
    The End is Nigh for CAPTCHAs as we know them, but they will always be around in some form. Google’s form presently seems the most promising.

  • annedougherty

    A vendor I worked with implemented reCAPTCHA because the users complained the in-house CAPTCHA wasn’t disability friendly and couldn’t be reloaded if site visitors couldn’t read it. According to the network admin it took the spam bots about 26 hours to crack reCAPTCHA so…better than nothing but not by much.

  • P jam

    I don’t get why you can’t use imagemagick to create ‘captcha’ like images on the fly? if they don’t get it, it reloads with a new randomly created image, again, produced on the fly. i’ve not yet been told why this method won’t work? it’d be easy enough to setup in ruby using rmagick, and a ruby script. lastly, why not just add a three strikes your out policy on comment creation, when using captchas. i assume these bots try to do guess ‘stuff’ over and over until they get it. lastly a scoring system like acts_as_snook sounds the best in my opinion. chances are, if you are commenting with words like poker/gambling etc… i don’t care about your comment anyways, on my site.

  • thegamecat

    Anything that involves activity on a browser is not viable as it can easily be reproduced by things like autoit.

  • bleh

    annedougherty, are you completly sure they cracked reCAPTCHA?

    To my knowledge, it hasn’t been cracked – although it is extensively outsourced to India for manual typing, or proxied to other sites.

    These are flaws of any CAPTCHA system, and cannot be called a crack of reCAPTCHA.

  • turb

    An easy and readeable method could be to tweak captcha and instead of asking people to copy the characters they see, to do some kind of multiplication like 5 + 3 – 1.

    I remember when I was young there was those kind of thing for chips and gum contest.

  • http://www.optimalworks.net/ Craig Buckler

    Random guessing yielded a 0.009% success rate with the suggested 3-image, 16-degree variance system.

    Assuming a 16% angle of error with 3 images will result in a success random guess rate of 0.009% (16/360^3 * 100). However, a 16% error is very small and three images is irritating. It would certainly account for the human failure rate of 16%.

    And of course, this all assumes completely random guessing of the image rotation. Let’s assume software can analyse the image and get it right 1 in 10 times. You’re then down to 1 success every 1,000 attempts. And the software will improve…

  • andy

    It’s a problem, and we’ve seen a ton of people filling out forms with REAL people, not by computers (mostly from India).
    Which brings up another problem with the “math type” CAPTCHA – they are better at math than we are.

    the funny thing about this article to me is that you can post comments to the bottom of it — without any CAPTCHA.

    Last note, one thing that has worked in reducing the number of spam comments on forms etc for our business, is not allowing click-able links in the comments areas. It reduces the incentive to spam a form, and makes your site less of a target as well, in some cases!

  • My220x

    CAPTCHAs are a pain in the butt and it’s very hard to identify some of them and I even end up exiting the site if I can’t fill in the CAPTCHA in three tries.

    I do however like the idea of page interaction detecting however it would most likely be easy to crack.

  • boyter

    Captchas are actually pretty easy to decode. My Thesis in University was on using CAPTCHA decoding techniques as a method of identifying text inside normal web images. I was astounded about how quickly someone can write a captcha breaker. Once you know the techniques you can write a simple program which can break a specific captcha with a high level of sucess in under an hour. Most of that time is spent building training data too.

  • rover3500

    Hi, i’m not sure if this is the same as P Jam,but why can’t they just use a random image which u have to identify like a train,or wheelbarrow-just nothing that has an obvious shape that a computer can identify,As long as the angle of the picture is not side on,i can’t think why it wouldn,t work.And also like p jam says,u get 3 goes at it,so a bot can’t have hundreds of guesses.

  • Tarh

    Let’s assume software can analyze the image and get it right 1 in 10 times. You’re then down to 1 success every 1,000 attempts. And the software will improve…

     
    In which case, so will the code that rejects easily guessable images. In theory, eventually both packages will improve to the point where Google declares all images easily guessable, and the CAPTCHA will be defeated. But, that’s theory — nobody has actually tried to break this yet. All of this is just as you wrote — assumption. Let’s give it a chance before we both make up numbers and scenarios to make judgments ;-)
     
    Presumably, if/when it is broken, we’ll be in the same place that we are now, and someone will come up with something new. It’s a game of cat and mouse, much like the malware industry. Hence,
     

    The End is Nigh for CAPTCHAs as we know them, but they will always be around in some form.

     
    Logic tells us that this reverse Turing test can never be successful on this medium (at least not without some sort of global system which would render the final blow to privacy and free speech), so all we can do is keep trying new tests. In fact, anyone who has played a serious video game recently (that doesn’t include you, Flash game and Wii Sports players) will know that a similar battle against cheaters has been going on for over a decade. There is no single solution. We can’t win, but if we keep mixing up the field, we can’t lose either. The only way to lose is to give up and declare all CAPTCHAs worthless.

  • http://www.optimalworks.net/ Craig Buckler

    Google’s testing isn’t an assumption – they have a 16% human failure rate – probably because of the small angle of error.

    I still think image rotation is too easy to crack. A single symbol in a standard CAPTCHA has 62 possibilities (if you use numbers, lower and uppercase English characters). A single rotational image with a 16 degree angle of error has 22.5 possibilities. How can that be better?

    CAPTCHAs are a necessary evil in some situations, but they’re probably overused and certainly used in dumb ways. There are lots of factors you can check before resorting to a CAPTCHA.

  • Captchas. Advertising Methods

    Captchas. Advertising Methods: some Captchas must break the law….

    Remember to allow for disabled users who may find that such Captchas are difficult to use. This could lead to legal action against you for discrimination.

    http://www.acomputerportal.com/advertising_methods/captchas.html

  • Anonymous

    I thought this article would contain useful information, but it was a waste of time.

  • Grunties

    @bleh: ReCAPTCHA has been ‘hacked’ rather than ‘cracked’, but the end result is the same. Search “moot times poll hack” for the gory details.

  • bleh

    @Grunties

    I’ve read the story behind the Times poll, which after the reCAPTCHA implentation was all manual labour. Something no CAPTCHA can protect itself from.

  • mmm

    I think CAPTCHAs would be far more effective if everybody made their own system. And by that I do not mean their own variation of “distorted letters in an image”.

    Personally, on my (smaller sites), I’ve made a CAPTCHA system a la “tell us what day of month it is today”. It’s easy. Both for the users, but also for me to change it to something similar simple when somebody eventually circumvents it.

    My point is: Diversity is one of the best ways to prevent spam. The more sites that is using e.g. reCAPTCHA, the larger is the incentive for spammers to circumvent that specific system.

  • Jonathon Hill

    Not very accessible, but very difficult to spoof:

    http://derekallard.com/blog/post/not-so-useless-image-to-text-as-a-captcha/

  • Jordan

    THE END IS NIGH