A Better CAPTCHA: Are We There Yet?

Photo: Anna FischerCAPTCHAs (Completely Automated Public Turing Test To Tell Computers and Humans Apart) are a nightmare for any usability expert but if we want to reduce spam, we do need them or something else in place.

While there are numerous CAPTCHA alternatives (or less known CAPTCHA varieties), you could argue that none of them are good enough for masses use on the Web.

Some of the CAPTCHA alternatives that exist can be used as a CAPTCHA replacement under some circumstances but they won’t solve the problem in general. And the problem is a pretty real one – CAPTCHAs hurt user experience and conversions.

Key Takeaways

CAPTCHAs are necessary to reduce spam but can be a usability nightmare. Unfortunately, no current anti-spam technology outperforms CAPTCHAs, despite their negative impact on user experience and conversions.
A good anti-spam solution should be accessible, non-disruptive, transparent to the end user, automated or require very little moderation, and not be a 3rd-party service. It should also not put a huge strain on the server/browser and must have a low percentage of false positives and negatives.
There are several CAPTCHA varieties, including reCAPTCHA, pure audio CAPTCHAs, image CAPTCHAs, video CAPTCHAs, simple math/question CAPTCHAs, and 3D CAPTCHAs. Each has its own pros and cons and none are perfect solutions.
CAPTCHA alternatives include checkboxes, honeypots, rule-based filtering, IP whitelists and blacklists, phone/email confirmation, other services, and privileges based on user rating. Again, each of these has its own advantages and disadvantages.
While we can’t do much better than CAPTCHA currently, we can design better CAPTCHAs. This includes saving user data input in the form and fine-tuning the complexity of the CAPTCHA to achieve the best spam to usability ratio.

The Perfect Anti-Spam Solution

We’ve all seen ads of various anti-spam products that claim they are the ultimate anti-spam solution.

However, you don’t need to be a genius to read past the marketing lingo to figure out that a particular solution is either easy to circumvent (hence unreliable), or requires too much effort (i.e. is not user-friendly), or generates too many false positives/negatives.

A user asked in an old thread at StackExchange about CAPTCHA alternatives and listed the following requirements any such alternatives should meet:

It must be accessible.
It must be non-disruptive and transparent to the end user.
It cannot detract or distract from the primary purpose of the page.
It must be automated or require very little moderation on a large scale.
It cannot be a 3rd-party service.

While I think most of us could live with the last requirement (i.e. a 3-rd party service, provided it’s reliable and user-friendly), basically the requirements listed by the user are the features of a good enough anti-spam solution.

In addition to them I would add just 2 more:

It shouldn’t put a huge strain on the server/browser.
It must have a low percentage of false positives and false negatives.

While these 7 requirements don’t sound too much, it turns out they are still an impossible dream, as there is no present anti-spam technology that outperforms CAPTCHAs.

So let’s discuss the main CAPTCHA varieties and alternatives, so you can see for yourself that unfortunately, all things equal, (text) CAPTCHAs are the lesser evil.

CAPTCHA Varieties and Other Anti-Spam Alternatives

reCAPTCHA

When we say CAPTCHA, we basically think of this:

reCAPTCHA

This type of CAPTCHA is called reCAPTCHA and it is now part of Google. reCAPTCHA displays ‘letter code’ that usually won’t form a word (to prevent dictionary attacks) which the user is asked to enter.

This letter code is visually distorted to deter OCR bots. Of course, the harder it is for the bots to read, the harder it is for users, too.

reCAPTCHA Pros:

reCAPTCHA comes with two options that improve usability: the option to request a new set of letters, if the present set is illegible and the option to play audio where the letters are spelled.

reCAPTCHA Cons:

Often, no matter how many times you refresh, all you get is a new set of equally illegible letters. This is the irritation of reCAPTCHA we all know. “Is that an squished ‘L’ or a squished ‘1’?“

A second recurring issue is that the audio fallback option might not be of much help to many users.

For instance, if the user’s hearing is impaired, or if he or she simply doesn’t understand the names of the letters spelled, as might be the case with international users. Perhaps they can read your English content (perhaps with the help of Google Translate) but the sounds ‘el.. too..five…jay…bee….em‘ means nothing to them.

Even the mere requirement of a working sound card can be a problem for users with older equipment, and/or libraries PCs and internet cafes.

reCAPTCHA is the most common type but it’s not the only one. In many cases, some of the lesser used varieties could be a better choice.

Here are some CAPTCHA varieties and their pros and cons.

Pure Audio CAPTCHAs

As the name itself implies, audio CAPTCHAs use sound rather than text to filter bots. reCAPTCHA itself has an audio option, so if you want to try the concept, you don’t have to look further.

As for the cons, we’ve already discussed them in the previous section – a barrier to hearing-impaired and/or international users.

All in all, audio CAPTCHAs are generally no better than the garden variety reCAPTCHA.

Image CAPTCHAs

Image CAPTCHAs are more of an alternative to standard text CAPTCHAs than audio CAPTCHAs. With image CAPTCHAs, you show an image instead of text, and ask the user a question about what he or she sees in the picture.

Click the flower

Pros of image CAPTCHAs:

It makes a game out of the CAPTCHA process.

Cons of image CAPTCHAs:

There are several issues with this approach.

Firstly, you need a large pool of constantly changing images to display. If you have, say, only 100 images, it’s not difficult for a human to review and train a simple bot what to enter.

Secondly, these images must be unambiguous and easy to understand. They must be simple objects – an apple, a cat, a car, etc. – that are obvious to everybody. If you put something fancy, you never know how your users will decode the image and how many times they will have to resubmit the CAPTCHA.

Thirdly, the language barrier presents as a problem again. To a native speaker, a simple object might be easy to write but there are international users who don’t necessarily know even basic English words.

Video CAPTCHAs

Video CAPTCHAs are one more CAPTCHA variety. They are the least popular because it is the hardest to provide a reasonable amount of videos, these require storage, and again – not everybody can watch and understand them.

Simple Math/Question CAPTCHAs

$Math question$

I would suspect that, after text reCAPTCHAs, this is the second most popular type of CAPTCHA.

The principle is this: you enter a simple math problem or a question, like “2+2=?” or “Which is the shortest month of the year?” and the user has to enter the answer.

Since math is universal, there is no language barrier to international users but this isn’t so with the question CAPTCHAs. This is why, if you want to opt for a question-based option, you’d better go with math questions only.

3D CAPTCHAs

Basically, 3D CAPTCHAs are even more irritating than reCAPTCHA itself but if you want to experiment if your users will like them more, you can do it. 3D CAPTCHAs don’t look that plain but they are even harder to read. You can try them for a change, but my feeling is your users won’t find them any more appealing.

CAPTCHA Alternatives

In addition to CAPTCHA varieties, there are numerous non-CAPTCHA based alternatives. I don’t want to sound biased but most of these alternatives don’t come even close to the efficiency of CAPTCHA. Read about these alternatives and judge for yourself if we’ll be stuck with CAPTCHA for years or not.

Checkboxes

Checkbox: Tick this if you are not a bot.

Checkboxes next to a field, such as “Check this, if you are human/not a robot.” are one of the best CAPTCHA alternatives. Checkboxes are generated using client-based JavaScript and in theory they are invisible for a bot. They are not as irritating as CAPTCHAs but they are easily missed by users.

Checkboxes are not 100% bot-proof and not all users have JavaScript enabled, but if you absolutely hate CAPTCHAs, you might give checkboxes a try.

On top of that, if this approach ever gains serious marketshare, it will be trivial to write a bot to exploit it.

Honeypots

Honeypots use the opposite approach, essentially asking the bots to identify themselves.

A form will contain a field that is not visible to humans, only to bots. These bots are programmed to be fast and simple. When they see a field to be filled, they do it, thus exposing themselves.

Of all CAPTCHA alternatives, honeypots look the most promising. They are usually implemented via CSS, so no client-side gambling.

However, they do have disadvantages.

Firstly, you need to add a warning for users with screen readers NOT to fill in the field. Secondly, hidden text can be looked upon with suspicion by search engine, making it potentially bad for SEO.

Rule-based Filtering

Most likely you are already using rule-based filtering but you don’t even know it. A good example is Akismet. You set the rules that make a comment/post SPAM and when a post/spam meets the criteria, it’s marked.

Rule-based filtering could give good results, especially if you combine it with manual administration. Most often the rules are created to search for given keywords. If they are present in a post, the post is marked as spam.

IP Whitelists and Blacklists

IP whitelists and blacklists are rarely useful because they are easily spoofed and there is a high level of false positives and negatives. With IP whitelists and blacklists you create lists of allowed (“white”) IPs and banned (i.e. “black”) IPs.

You are correct if you are thinking that a blacklisted IP could be easily bypassed with the help of proxy, or simply by posting from a different location. In fact, a simple browser extension can bypass this defence.

What is more, often a legitimate user can get blacklisted because his or her IP has been used in the past by spammers. This alone makes white/blacklists a particularly clumsy solution.

In reality, this approach most likely creates more problems than it solves, but if you have nothing else at your disposal, as a last resort you might try it.

Phone/Email Confirmation

It’s possible to include one more type of user verification – have him or her confirm via email or over the phone that he or she is not a bot. Phone verification could be useful in ecommerce – you can call the user and make sure this is a live person and only after that you dispatch the goods but in many other cases it’s simply an overkill.

Other Services

One of the requirements for a good anti-spam solution was that it doesn’t use 3rd party services but since I believe this isn’t a deal breaker, let’s include it as well. After all, reCAPTCHA itself is a third party service because the symbols entered are verified at their servers, not at yours.

You can reduce spam if you allow only users registered with sites, such as OpenID, Disqus, Facebook, or G+ to post. However, a bot can also have an account with these services and pose as human.

Privacy issues are an obvious concern with this approach.

Given the public nature of the Net, and the fact that once you post something, it stays there forever, not everybody is comfortable posting under their name at all times.

Privileges Based on User Rating

Privileges based on user rating are an alternative for forums and communities. Here are some suggestions:

Automatic Approval for Posts from Trusted Users

Posts and comments from new users are checked before they are posted live, while posts and comments from trusted users are published automatically (and randomly checked later, just in case).

Moderation/Flag Privileges for Trusted Users

User privileges could go even further. For instance, you can authorize very trusted users to moderate or at least flag posts/comments. However, this could be very biased because if a user doesn’t agree with a post, he or she can easily flag it even if it is not spam.

Number of Posts Before A User Can Post Links

The principle is simple: you can post a link, if you have at least 10, 20, 50, 100, or any number of posts, and/or you’ve been a member for a month, three months, or any period of time. This stops bots but isn’t convenient for ordinary users. Of course, this approach can be abused as well (i.e. post the required number of clean posts and then start spamming) but you need lots of effort for this.

Combination of Two or More Approaches

The perfect anti-spam solution would do all the work, but unfortunately such a solution is not to be seen soon — perhaps ever.

You can combine two or more approaches, though.

For instance, you can use Akismet or reCAPTCHA for the rough filtering of spam and then have a human admin moderate anything that Akismet or reCAPTCHA missed. For a large site admin moderation is somewhat painful simply due to the sheer volume of posts and comments.

Unfortunately no matter how advanced technology becomes, 100% automation is not a good option because it leads to a relatively high level of false positives and false negatives.

For now, there is no way to exclude humans from the anti-spam process, so even if you get a solution that seems perfect, you will always need to check its choices.

Are we there yet?

As you see, it’s not that there are no alternatives to reCAPTCHA. There are almost a dozen CAPTCHA and non-CAPTCHA based approaches. Unfortunately, they all have serious drawbacks that make them unusable in most cases.

Currently we can’t do (much) better than CAPTCHA, but we can design better CAPTCHAs — and this does make a difference.

For starters, one huge improvement you can make is to always save the data the user has input in the form.

Is there anything more cruel than watching all the data you diligently entered in the form, evaporate into the ether simply because you thought a squiggly ‘h’ was a squiggly ‘n’?

One more step you can take is to finetune the complexity of the CAPTCHA. Many CAPTCHA systems allow you to tune the level of character distortion. Try different levels of difficulty and see at what level of difficulty you achieve the best spam to usability ratio.

Obviously, the easier the CAPTCHA, the higher the spam level and vice versa. You might be tempted to use the most difficult CAPTCHA but this is the worst for usability.

We need to find the equilibrium point. You are probably never going to be 100% spam free – this is utopia – you are going for acceptable levels, so just test at what level of difficulty spam levels are more or less acceptable and don’t go beyond it.

Frequently Asked Questions about CAPTCHA Alternatives

What are the main drawbacks of traditional CAPTCHA?

Traditional CAPTCHA systems, while effective in preventing bot attacks, have several drawbacks. They can be frustrating for users, often requiring multiple attempts to solve, which can lead to user abandonment. They can also be inaccessible to visually impaired users, even with audio alternatives. Additionally, sophisticated bots can now solve CAPTCHA, reducing its effectiveness.

How does a honeypot CAPTCHA work?

A honeypot CAPTCHA works by adding an invisible field in a form that humans cannot see but bots can. When a bot fills in this field, the system recognizes it as a bot and blocks it. This method is user-friendly as it does not require any action from the user.

What is a biometric CAPTCHA?

Biometric CAPTCHA is a security measure that uses unique human characteristics such as fingerprints, facial recognition, or voice recognition to distinguish between humans and bots. This method is highly secure but requires users to have the necessary hardware, such as a fingerprint scanner or a webcam.

How does a time-based CAPTCHA work?

Time-based CAPTCHA measures the time it takes for a user to fill out a form. Since bots can fill out forms much faster than humans, this method can effectively identify and block bots. However, it may also block fast human typists.

What is a No CAPTCHA reCAPTCHA?

No CAPTCHA reCAPTCHA is a Google service that requires users to simply click a checkbox to confirm they are not a bot. If the system suspects the user might be a bot, it will present a traditional CAPTCHA to solve. This method is user-friendly but can be bypassed by sophisticated bots.

How does a behavioral CAPTCHA work?

Behavioral CAPTCHA analyzes the user’s behavior, such as mouse movements and keystrokes, to determine if they are human or a bot. This method is user-friendly and effective against bots, but it may raise privacy concerns as it tracks user behavior.

What is a text-based CAPTCHA?

Text-based CAPTCHA presents users with a simple question that humans can easily answer but bots cannot. This method is user-friendly and accessible but can be bypassed by sophisticated bots.

How does a math-based CAPTCHA work?

Math-based CAPTCHA requires users to solve a simple math problem to prove they are human. This method is user-friendly and accessible but can be bypassed by sophisticated bots.

What is a 3D CAPTCHA?

3D CAPTCHA presents users with a 3D image or puzzle to solve. This method is highly secure but can be frustrating for users and inaccessible to visually impaired users.

How does a social media CAPTCHA work?

Social media CAPTCHA requires users to sign in with a social media account to prove they are human. This method is user-friendly and effective against bots, but it may raise privacy concerns as it requires access to the user’s social media account.