Google’s New CAPTCHA: The Only Way Is Up!

CAPTCHA has always been a controversial subject. Apart from the well-documented accessibility problems, there’s the simple irritation we all feel when we’re asked to perform a circus trick to prove we are a person — something we usually take for granted. The unfortunate fact is spammers aren’t going away any time soon, and many have the time, resources and inclination to exploit any angle they can find. We need good, original thinking in this area.

A CAPTCHA Based On Image OrientationLast week Rich Gossweiler, Maryam Kamvar and Shumeet Baluja from Google research published their latest ideas on the subject entitled ‘Socially Adjusted CAPTCHAs‘.

A white paper explains the idea in detail but the concept is simple enough. Users are shown a circular-cut picture that is rotated to random, non-standard angle. They are then asked to rotate the image back to it’s correct orientation.

As humans who have evolved to quickly process visual information on the real world, we’re all born with very good software for determining which way is up. Computers, however, are currently nowhere near as skilled at making sense of a potentially wildly varying array of images. You only have to look at the comparatively plodding movements of Honda’s Asimo robot or robot soccer to understand just how taxing a task this can be for a machine.

Obviously the method shares some characteristics with other image-based CAPTCHA methods (i.e. such as the ‘How many kittens do you see?’ method) but has one major advantage. Where other methods require humans to write new tests (i.e. ‘How many .. um.. goldfish?..’), fresh Socially Adjusted CAPTCHA tests can be easily automatically generated by a machine, but not as easily solved by one.

Even without many perdendicular lines, most humans have little trouble discerning which way is up in image such as these.

If you consider classic alphanumeric CAPTCHA methodology, a bot only has try to match around 40 characters to any given glyph — albeit a distorted glyph. Image orientation is powered by an almost limitless pool of feeder images, taken from wildly varying subject matter, aspects and angles. Sure, writing a bot that searches for horizons and perpendicular lines would be reasonable start but it will only get you so far (as the examples show).

Now, this is certainly no home run. It’s no improvement for the vision-impaired. Similarly, motor-impaired users may well know which way they’d like to orientate the image, but may struggle with the physical process of re-orientating the image. Perhaps great interface design can negate this problem.

There has also been some criticism that the method offers no protection against spammers who employ cheap human labor to crack CAPTCHAs.

However, as I see it, this is an unfair call as it falls outside of CAPTCHA’s working brief — to sort the humans from the bots. Sorting good users from mischievous users is an entirely different class of problem.

So, what do you think? Would you be tempted to replace your alphanumeric CAPTCHAs with something like this?

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • Bashar

    Probably. It seems much stronger than regular CAPTCHAs. Pligg CAPTCHA for one has failed miserably for me.

  • dougoftheabaci

    There is a fatal flaw that makes this no better than the normal method.

    You make a bot that is designed for this kind of CAPCHA. It wouldn’t need to look for horizon lines, it would simply need to log the image (as each site would have a finite number) and then rotate to any given percentage. If it failed, it would log the failed percentage and move on as many times as necessary.

    Eventually it would reach a point where every image was logged. Also, since users can’t be expected to guess the exact right percentage, there would need to be an acceptable range for success, say 20-degrees. That gives the computer a 1/18 change of getting it right on the first attempt. The next time that image comes up it’s a 1/9 chance.

    See? it would actually take less time to crack. The only way to lengthen it would be 100+ possible images. But given a computer could, in theory, run the test a few hundred times a minute it wouldn’t take long before it found a winner.

    Wow, did I just out-smart Google? Scarey…

  • clussman

    The pool of photos is unlimited since Google can serve them in real-time from their own Google Images cache using the same filtering they already use to filter ‘safe’ images. On the surface it would appear to be subject to brute force attacks though. A bot could just rotate each image randomly until it succeeds.

  • k3liutzu

    dougoftheabaci, you could use an external image source, such as flickr, and have access to un almost unlimited number of images

  • Timoftheobvious

    You missed a few things…
    1. The word “limitless” (referring to the amount of “feeder imagery”)
    2. Magnification. Even a minúte amount of magnification could cause significant problems for spammers.

    I will grant you that a decent “margin of error” would seem to make guessing a viable hack.

  • James

    Say the images are sourced from Flickr…what happens if someone decides to take photos that are angled for artistic purposes? A human would then “fail” to orient the image.

  • Ian

    Can anyone tell me which way is up on the first of the 3 images as I don’t see it?

  • http://xslt2processor.sourceforge.net boen_robot

    The way I see it, having an external image source isn’t much better. In fact, it may be worse.

    If you knew sites A, B and C use Flickr, you can combine the method dougoftheabaci described along with making the bot actually go to Flickr and log THOSE images’ right angles and their appearance on it. Or in other words, you’d take those images, and generate the “up” CAPTCHA image from them. Comparing the presented CAPTCHA image to the logged “up” CAPTCHA one should then be simple enough.

    Current alphanumeric CAPTCHAs are hard to crack because they don’t have “stored” (and therefore loggable) possible values. They are becoming easier to crack because they are generated by a distortion tool (threreby giving a limited set of outcomes for a single value), and because there are limited values to use.

    These CAPTCHAs have it the other way around – they are hard to crack because they use an unlimited set of possible values. However, they will quickly become easy to crack as they use “stored” values, even though the “stored” values are again “distorted” by means of rotation.

    A possible solution to this would probably be if the images had a random figure (square, diamond, circle, etc.) at a random point within the image. This would much further increase the range of possible values, and would make them much harder (if not impossible) to log, as a single image will have a range of representations (similar to nowadays’ CAPTCHAs which have a limited set of text values with different representations).

  • ann

    I think Photoshop has some sort of auto-rotation function that takes images and figures out which is the right way up. I’m not really convinced that a machine can’t solve the great majority of these puzzles.

  • Amtiskaw

    Google have access to a pretty massive amount of images, as they archive pretty much every one on the internet via image search, so I think they’ll be able to have a wide enough library of images as to make recognising them all impossible, particularly when the degree of rotation is randomised each time.

    However, I do think this is an arms race they can’t ever win. The combination of money-driven spammers and curiosity-driven AI researchers is bound to overcome any technique sooner or later. I wonder if is this is really just a way of crowdsourcing the solution to an interesting AI challenge. All we need to do is create a CAPTCHA that is a foolproof test for sentience and we’ll have thinking machines in a month!

  • http://www.sitepoint.com AlexW

    Assuming they used CC licensed imagery to power it, there could be practically limitless sources of imagery. Flickr + smugmug, + Google Images + Picasa etc.. I’m not sure that anyone without the mass processing power of google would be able to produce a large log of potential answers.

    I do have another query though. I’m no image processing expert, but I wonder if there’s a latent grid or matrix of some sort imposed on an image when it is created? And that a computer might be able to discern that grid even after it’s rotated and cropped?

    Any JPG format gurus out there?

  • James Sarjeant

    In response to the arguements above, you could store the degrees already tried but a large or unlimited soucre of images such as Flickr, rotated to random number of degrees would make life hard for a bot to crack. If the bot originally ruled out 43 degrees CCW, the next page load it would not try 43 degree CCW but this could easily be the right amount so Google may well have hit the jackpot.

  • Best Desi Blog

    Well if google provided this solution as a easy API, this would translate into plugins for various platforms, imo ReCaptcha is doing a good job, reading books as well as making lives for bots more difficult ( referring to the 4chan and time.com debacle.

    But as you mentioned spammers use cheap labor to get these captchas read, and they in turn can make algorithms which can guess..in short I really don’t think this will be a permanent solution there are ways to get around this. We need a permanent solution.

  • dougoftheabaci

    You could get around that by having it generate the images lopsided to start with and then when you correct it its grid, which I guess you could detect via the pixels, would show up.

    Also, it wouldn’t take much processing power, just time. Then again, considering spamming is a billion+ dollar industry every year I can see someone investing the resources.

    Of course, there are currently plugins available for GIMP and Photoshop that will auto-correct an image that’s crooked. Why someone couldn’t figure out a server-based method I don’t know. The way I see it if someone can figure out how to create Content-Aware Scaling for Photoshop they can figure this out.

  • http://www.sitepoint.com AlexW

    Can anyone tell me which way is up on the first of the 3 images as I don’t see it?

    It is the hardest of the three, Ian. It’s a mountain stream rotated about 30 degrees counterclockwise.

    On the brute force attacks, I expect you would only be allowed one try at any given image with maybe a +-5 degree margin for error. If you miss, you get a new pic. I guess a bot would have a 1:36 chance of fluke this.

    +-5 degrees seems about right to me looking at the quick example below.

    http://i2.sitepoint.com/images/blogs/captcha4.png

  • http://www.sitepoint.com AlexW

    Hey, guys I actually got an email from Rich Gossweiler clarifying a couple of the points raised, which you might be interested in.

    If your readers care, we show several images that the user needs to rotate upright to increase the odds against guessing (and of course you can’t keep re-guessing with the same image). We also show one image as a “candidate” image and log how well people do, but don’t use it to let you in. If people’s answers have high variance then we don’t use the image for real. If they all say about the same thing, even at an angle, we use it. That’s how we make it easy for humans and handle pictures taken at an angle.

  • nachenko

    It’s an amazingly simple idea. And the dude inside the washing machine is fun, so, BTW, you can convert a CAPTCHA into a funny experience.

  • http://www.clearwind.nl peach

    if google would use a pool of images like flickr they’re going to have a substantial miss-rate from people who cant take picutres straight. Im pretty sure my mobile phone pictures that I upload are rarely straight,

  • melissapbr

    what would be the accessible alternative for screen reader users?

  • http://www.sitepoint.com AlexW

    If you read the detail of the white paper, the user would be presented with more than one image to re-orient (three most likely). I guess you could think of this like a combination barrel lock. A machine might well fluke one at the right angle, but acing all three would be tough.

    This also provides a test bed for images. Some images are going to be less obvious than others — for instance a close-up of table setting versus a person standing in a doorway.

    As data accrues the table setting shot might allow for a +-10 degree margin for error. The door shot may only allow 2 degrees. The system would learn.

    This means, like spam email filtering, the system will automatically evolve and improve as people use it.

    @melissapbr I doubt you would try to produce an audio equivalent of this method. What is the audio equivalent of up? I would think this would be teamed with more traditional accessible fallback options.

  • smarsh

    There is soooooo much on this site about accessibility – even a forum. It’s a joke to consider captcha a good thing while touting the need for accessible web sites.

    One or the other people – make up your minds!

  • http://www.ThePatio.net michael – ohio

    Smarsh is correct: A CAPTCHA system which does not provide an audio option is not viable for any serious website as it would be inacessible to a significant number of people. Assuming that the details of the visual interaction can be worked out – what would the audio option be for the sight impaired?

  • http://www.sitepoint.com AlexW

    @michael @smarsh Guys, as this is a research paper, it has no current real world implementation, so you can’t assume it won’t have an accessible audio fallback option.

    Now, it won’t be an audio equivalent of this system — I can’t imagine how that would work. But there’s no reason why it should be, is there?

    So, I can’t see why we’d be any worse off, and in many ways we should be better off.

  • http://www.fastliondesign.com FastLionDesign

    I think a website should present the beginning of a joke. Then the websurfer would have to provide the punchline (instead of a captcha).

    Spammers are a dour, humorless bunch. They wouldn’t be able to figure it out.

    George
    http://www.fastliondesign.com

  • http://www.calcResult.co.uk omnicity

    “You only have to look at the comparatively plodding movements of Honda’s Asimo robot or robot soccer to understand just how taxing a task this can be for a machine.”

    I’m sorry, but a physical machine trying to imitate natural motion, as an actual method of locomotion has nothing in common with image processing.
    This task would probably not need any understanding of the image: if it is possible to detect horizontal or vertical lines in the image, then that gives a completely dumb algorithm a one in four chance of getting a single image right, and therefore 1 in 64 chance with three images, which is easily enough for a brute force approach.

  • Mark

    If the margin of error is 1 in 36 (as guessed at above) then the odds of fluking all three are 1/36 * 1/36 * 1/36 = 1 in 46656. If you reduce this margin of error then it increases significantly – say 2/360 * 2/360 * 2/360 = 6 in 46656000 = 1 in 7776000.

    A lot of people seem to be assuming that the same picture will always be rotated by the same amount, and therefore when you get it correct you can store that value somewhere. I would assume the picture is rotated a random amount each time it is presented, meaning you couldn’t assume the same values, and also if you gave the picture a different file name how would you know which picture was being shown? You could also run filters over the images to change them maybe?

    I guess image recognition might advance to the point where a machine can tell what it is (like facial recognition), but couldn’t you just put the same squiggly lines over it that any current CAPTCHA has?

  • http://www.sitepoint.com AlexW

    @mark Exactly. If I understand correctly, each image would come to acquire a margin for error *specific* to it, based on the normal distribution of results submitted for it. For instance, a profile shot of a doorway might only allow +-2 degrees of error. Other images might build in larger margins.

    But, yeah, an average of 1:36 seems reasonable to me, and once you start using more than one at a time the number of possible combinations becomes huge.