The End of CAPTCHA?

Aside: It’s good to be back at the wheel after a short interlude — having gained a baby daughter but minus a little surplus sleep ;)

Most of us have used a ‘CAPTCHA’ at some point, even if we didn’t know that’s what it was called. According to WikiPedia, ‘CAPTCHA‘ is the acronym for ‘Completely Automated Public Turing test to tell Computers and Humans Apart’ — ok, they’ve flouted the ‘rules of acronym’, but you have to admit it’s probably more catchy than the more technically-correct ‘CAPTTTTCAHA’ .

CAPTCHAs are used to filter ‘flesh and blood’ users from the various bots, crawlers and spiders designed to exploit web-based feedback channels. They’re particularly common in areas like email and comment spam filtering systems.

The idea is to set the sort of test that a human will pass easily, but bots will struggle with. Although there are alternatives, if you’re a web developer looking to implement a site with an unmoderated comments, ‘CAPTCHA’ are usually your first option — but for how much longer?

The reason I’m thinking about ‘CAPTCHA’ this morning is that last Friday I sent the Design View, which always means I spend the following morning responding to comments, questions and most commonly antispam verification systems. While some are still mindlessly easy — hit reply — others are getting so difficult that I find myself taking three or four attempts to pass.

UOL.com.br’s ‘challenge/response’ system seems to be the worst offender I’ve seen. Here are some examples of the ‘simple tests’ I failed.

You aren’t even afforded the chance to learn from your wrong answers. Often your mistake is simply choosing an uppercase letter over it’s lower case equivalent, but it’s ‘back to the drawing board’, with a new test set each time you get it wrong.

I would imagine that if I’m having to work hard to make the system work, many others would lose confidence much more quickly — in fact, I’d tip that both my parents would assume they were doing something wrong after the second rejected attempt.

So, is it just a matter of simplifying the tests?

Possibly, but unfortunately the bots are fighting back. According to CAPTCHA.com, researchers at the University of California at Berkeley have developed AI software capable of 83% accuracy with Yahoo’s CAPTCHA system. This will only improve.

When you throw in the accessibility drawbacks (vision impaired users have enough trouble reading normal type), the CAPTCHA method looks to be in a spot of trouble.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • http://simon.incutio.com/ Skunk

    All CAPTCHAs are eventually doomed because they can all be defeated using the brilliant “free porn” attack. Here’s how it works: Spammers set up a system that scrapes CAPTCHAs from the target site (the Hotmail account creation page for example) and serve them up somewhere else on a “free porn” website. Porn surfers are told to solve the CAPTCHA in order to get their fix. Their solution is passed back to the original site.

    This isn’t just a theory – it’s an attack that’s being used in the wild. As far as I can see, it renders any and every CAPTCHA system irrelevant. How do you tell the difference between a human who wants to sign up for an account with you and a human who wants free porn from somewhere else?

  • http://www.sitepoint.com/ mmj

    In my opinion, one of the biggest flaws with a CAPTCHA system is the assumption that humans can be trusted more than machines. Obviously, that is not the case.

  • Bryan Price

    Captcha’s can be done well, or they can suck badly. And most of them suck very, very badly. Case should never be an issue. There should be reasonable limits on how badly to stretch the character, and color combinations need to be human readable (like a machine really would have more difficulty trying to seperate dark maroon from dark brown! Or how ’bout fading the background color right into the foreground color!) And then there’s the font choice. Make sure the font allows the human to differentiate between 1 and l, 5 and S, O and 0.

    It also might help if you use real words, which would help people know what they should be typing.

    I prefer the easy captcha’s that say something like, enter my first name. :D Those are the best!

  • http://www.sitepoint.com AlexW

    Some good points, Bryan. I have to clear my newsletter through Spam Arrest, and I must say their tests are pretty easy — always lowercase and always a relatively common dictionary word. I doubt a bot would find it much easier.

  • http://www.peterakkies.com/ Kilroy

    These kind of CAPTCHA’s may not last much longer, but the simple-mathematics type might. The idea is that you ask the user to solve a really, really simple mathematics question. The user will have it solved in no time, but a complex algorithm has to be written to let a computer solve it. Once the algorithm has been written, you slightly alter the test, so another new complex algorithm has to be written.

    While this method still means staying ahead of the programmers, it will always be like that. People will never stop trying to get around CAPTCHA’s.

  • http://www.deanclatworthy.com Dean C

    I think the best point raised here is with accessibility. How on earth is a blind person meant to set up an account on a site if they’re presented with an image verification system like above :)

  • http://www.eclecticdreams.com Matt_Machell

    Here’s what the WAI has to say on this style of turing tests:

    http://www.w3.org/TR/turingtest/

  • chris ward

    I’m sure I found an audio-equivalent to a CAPTCHA somewhere on the msn network last month, which is great for blind people!

    But surely voice recognition is easier to utilize than OCR stuff!

  • aneitlich

    Congrats on the new addition!

  • http://www.igeek.info asp_funda

    I’m sure I found an audio-equivalent to a CAPTCHA somewhere on the msn network last month, which is great for blind people!

    It’s on Hotmail, when you try to send too many emails in a minute or so, it gives you an image CAPTCHA to verify that you are a human. It has an alternate Sound CAPTCHA which I prefer as the image presented is almost un-readable to get it right.

  • COMALite J

    If the choice is between an image and a sound, what would Helen Keller have to do? Remember, she wasn’t famous for being both blind and deaf. Lots of people before and after her, and today, are both blind and deaf. She was famous for being the first deaf-blind person to graduate from college.

    Today, deaf-blind people do use computers, thanks to Braille readers and the like. But if the choice in a CAPTCHA is between an image and a sound, neither is going to work for them. That alone should make all current CAPTCHA systems violate the intent, if not the letter, of Section 508 and other accessibility standards.

    But all of this is moot, for the reason that Skunk brought up, and which I was already aware: the Free Porn attack. There is NO KNOWN WAY to block it, especially if the attacks come throuhg proxy servers and so obscure their IP origin! And it defeats ALL CURRENT AND IMAGINABLE CAPTCHA systems.

    With this in mind, I hereby state that ANY Web developer who is aware of this (and everyone reading this thread now is), and continues to use CAPTCHA systems anyway (at least without informing the client of this vulnerability that makes them utterly useless and only a hinderance to the honest user) is guilty of misleading and deceiving his or her client, which is tantamount to fraud.

    Anyone selling a CAPTCHA system to or for Web developers who is aware of this is guilty of outright fraud, IMO. Anyone selling a CAPTCHA system who is NOT aware of this is guilty of incompetence or at least of not keeping up with industry news, since this attack has been known about, publically, for well over a year now, if not longer.

  • http://www.sitepoint.com AlexW

    COMALite J, although you make some great points, I noticed you didn’t actually attempt to post an alternative solution to CAPTCHA.

    As clever as the Free Porn Attack (FPA) is, I’m not sure it’s as scalable and as easy-to-implement as most of the current ‘set-and-forget’ comment spam bots.

    The FPA:
    * requires a valuable, non-renewable commodity (the porn) to drive it.
    * requires you to reach a market — I know ‘free porn’ is a handy marketing line, but you’re probably not the only guy in the world pushing that line.
    * requires more expertise and infrastructure than previous methods.

    No doubt, hardcore spammers will run with it and cause problems, but I can’t see it filtering down the chain very quickly.

  • Chas

    Wouldn’t it make sense to check the referring page, to prevent FPA? or am I reading this wrong?

  • David

    Using a captcha with a sort of “station identifier” may help. You can at least intrinsicly inform unsuspecting porn viewers that the CAPTCHA is from a different site.

  • http://www.sitepoint.com AlexW

    Using a captcha with a sort of “station identifier” may help. You can at least intrinsicly inform unsuspecting porn viewers that the CAPTCHA is from a different site.

    David that’s a nice idea. Perhaps you do could use your site’s URL as the watermark that obscures the code word?

    Wouldn’t it make sense to check the referring page, to prevent FPA? or am I reading this wrong?

    It’s a start, but I’m pretty sure the FPA could request the CAPTCHA page, display the CAPTCHA image from it’s own cache in a new page to the ‘mule’, and then passes the answer back to the original CAPTCHA page. The mule never comes into direct contact with the CAPTCHA page.

  • Bruce

    The FPA scheme is not foolproof by any means. There are tons of ways to defeat someone who wanted to use that method to bypass CAPTCHA.

    The most obvious was already brought up: check the HTTP_REFERER to make sure the image is only being served in the proper context. It is possible to defeat that by forging the page referer, but there are still other ways of thwarting the free-porn method.

    One approach would be to encapsulate as part of the checksum for the image value, the time-of-day or IP of the calling system. The FPA effort might decode the image, but part of the checksum may depend upon a certain time frame and the code changes with the time, or even better, with the IP address of the calling system, so that particular code would only work if that same HTTP client made the reference. I’d like to see a FPA work in that scenario — it would be quite a feat.

  • http://www.sitepoint.com AlexW

    Bruce, if I (as an ‘FPA-Bot’), request a CAPTCHA page, I download it and all it’s images to my cache. If I then serve a brand new page that pulls the CAPTCHA image from that cache, there’s nothing the real CAPTCHA page can do about it.

    When the person submits the code back to me, I can simply pass-the-parcel back top the original page.

    As far as time stamps and limitations go, the process could conceivalbly be driven by the presense of the ‘porn guy’. It’s like a light bulb powered by a mouse running in a wheel. When the mouse feels like running, the light flickers to life.

  • Anon

    I believe there are many excellent methods of reducing abuse on a CAPTCHA form stage, some of which have been mentioned here. Research into web visitor fingerprinting is a good start, also building a profile of the user as they move through pages of a form can provide clues when hunting for bots. There is nothing fool-proof from either side of the fence.

  • Robert

    Check out the audio captcha on notonebit.com http://www.notonebit.com/projects/killbot/kbaudio.php

  • DemonX

    I know I am very late to this, but I agree 100% with AlexW. There is no possible way to detect an FPA-BOT. It is fool proof. There is no way for the original page to detect the process that going on behind the scenes.

  • Tom

    CAPTCHA at yahoo is driving me crazy. Unless they find a less intrusive approach and address my complaints they will soon loose me as a paying customer.–Tom

  • GigoIt

    Thought you guys might like this.

    GigoIt’s HumanAuth is based off the ideas presented by KittenAuth.com. HumanAuth supports ADA and Section 508 requirements, increased security and includes watermarked images with random positioning. HumanAuth ensures that an actual human is using your site without forcing them to read distorted CAPTCHA text.

    http://www.gigoit.org/humanauth/

  • Steve

    Be careful about using image-only captcha if you and your server are based in the US. There are already greedy-ass losers sueing websites that cannot afford a legal defense left and right for violation of ADA. Math equation strings are no better either, as it discriminates against the feeble minded.

  • Jason DAngelo

    Ok, this bugs the heck out of me. Having the ability to read mangled letters does not define human/robot. All these idiots are doing, is wasting bandwidth. How… Humans are retrying, and robots are retrying ten times more often.

    Even worse, any moron can sit there and type 10,000 correct captcha’s in an eight hour period. Thus, they are a waste. They jut use a robot to set 10000 pages, an the user simply enters the captcha info. (While the advanced robots just OCR the image.)

    Here is a great trick, make all the captcha images animated GIF, with the first frame as wrong data. Then, the second frame, clear to read, has the correct data. Why, you ask… Because OCR reads STILL images, gif, as a still image, only shows frame 1. They will quickly read the wrong info, and be lost. Again, you could make the image twice as large, and display it with an offset, set as a table background. The wrong letters being on top, or scattered about the borders. While the image is displayed, for humans, with an offset of -100, -50, which only displays the correct letters to the user.

    Here is another novel idea… No images required. Simply ask them two math problems. What is (11+3=?), type your answer in the yellow box. What is (8-2=?), type your answer in the green box. Use a mixture of WHITE on WHITE fake numbers, and HTML CHR codes. You can even use hidden fake answers, in unseen table cells. Robots read HTML TEXT, they do not look at the actual displayed page. If you changed your format on every confirm, they would have to program for eons to handle your changing format. (CSS CellA Display Hidden) If you created the page, you know what data to expect. That is 10000 times faster then transferring a dynamically create image.)

    For extra validation, use one image, (The word “ONE”, a playing card “SEVEN”, a dice “FIVE”, a group of ducks “THREE”) Ask them, what number does this image represent? I could go-on for days… CAPCHA died ten years before it was born. I would not doubt, if the people who created it, created it, for the sole purpose of selling the OCR bots that read them. OCR has been around for 20 years, twist OCR is rather new, but only for the public.

    The best form of anti-spam, is a simple precounter for any page. Pretend to load a page with a serial, all images with hotlink protection… The delay should be approximately 2min. Only because “New accounts” are not a daily visit, nor are postings. If they want it, they will wait. Show them TOS and PRIVACY rules to kill time. (Recorded in a database for that page serial.) Give the guests a pretend “LOADING”, and show an expected “FINISHED LOADING” ending time. (Progress bar). After that time, the submit button shows. If it is a robot created page off-site, they can not see the hot-linked images. If the page has an invalid serial, or a serial that was created 1.5 minutes ago, the page fails. Again, you also show a PAGE EXPIRE time, lets say, 5 min… Pages older then five minutes, have to be reloaded. (That stops robots from preloading pages for “BOB” to sit and enter CAPTCHA codes all day.)

    Do not blink, do not pass go… send me $200.00.

    Hehe… Thank-you, hope I did not waste your time. Jason DAngelo
    (Feel free to remove my site link, if it offends.)
    http://www.MYeTAG.com

  • http://www.sitepoint.com AlexW

    Jason there are already a few systems out there doing some of the things you say. Lately I’ve been seeing quite a few cute ‘Prove you are real by telling me how many kittens are in this picture’. They work ok.

    Most of the time these questions will need to be very simple, which means there are often a limited number of potential answers. If you wrote a system to blindly try, 1,2,3,4,5,6,7,8,9,10, red, green, blue, yellow, purple and white, I think you’d have knocked over half these types of Captchas in the first minute. Expanding that list to 100 potential answers wouldn’t he hard, and recording which answers worked which which canned question means the system gets more efficient as it works.

    The big problem with most systems is that they all rely on at least some level of sight, and sometimes even perfect vision. Your overlayed letters method might work fine under normal circumstances, but does it work the same if someone with reduced vision has scaled up their font size? Imagine asking a blind person ‘what color is this puppy?’ or ‘how many kittens are there?’ is obviously ridiculous.

  • Jesse

    I was thinking about this a few days ago, and I think the handicapped inaccessibility of it is a major setback for it to be able to stay around much longer. For example, any federal site is required by law to make sure blind people can access their site just as easily as anyone else. This is usually done by adding alt tags to images, something that the site reading programs can pick up on, and say to the handicapped user. However, anything placed there for a site reader to pick up on, could just as easily be found by a captcha bypass bot.

    I just don’t see how any site using Captcha can be considered W3C compliant.

  • Terrell

    ^^^Jesse how would a blind person would be able to access the internet in the first place?

  • http://www.sitepoint.com AlexW

    ^^^Jesse how would a blind person would be able to access the internet in the first place?

    Terrell, there are browsing devices available called ‘screen readers’ that read the content of a page out aloud — effectively turning the web into a giant podcast.

    Users with vision issues quickly become very proficient with these devices and navigate through pages at a similar rate to you or I.

    In fact, the average blind user probably spends more time online each day than sighted users. Compared to printed brochures, TV guides and hard copy recipe books, the web provides the easiest and most direct access to the same information — provided the web page author hasn’t done something silly like turning text into an image which can’t be read by the screen reader applicatation.

  • Alex

    I thought I was the only one who had trouble reading CAPTCHAs – have you seen Rapidshare’s new one where you have to look for a cat symbol in each letter? – ITS IMPOSSIBLE!!! – CAPTCHAs need to go – and fast!!!

  • mario

    i personally have never liked captcha, but it is another tool. Treat spammers like thieves. They are going to get what they want one way or another, just make it as annoying as possible. Captvha is not perfect, neither will any other solution, since they will all be software based. All software can be cracked by another piece of software. Nothing is fool-proof, especially after it has been online for more then a week, someone has it, and will be determined to get around it.

    If you are looking for the end all be all method, then you are searching for something that doesn’t exist. Use captcha, for a start, then research other methods, and attempt to stay ahead of the game.

    Good luck.