reCAPTCHA: Awesome Use of Wasted Time That Works

A little over a year ago, a research team at Carnegie Mellon Univeristy launched reCAPTCHA, a plug-in CAPTCHA service for web sites that serves the dual purpose of fighting spam bots and helping the Internet Archive and other clients to make sense of digitized print content. CAPTCHAs, those hard to read images web sites sometimes ask you to enter before submitting form data, can be an effective way to combat spam, but they’re also tremendous time sinks. Each day on the web people are confronted with a whopping 200 million CAPTCHA images, and deciphering them consumes 500,000 hours. The reCAPTCHA system makes brilliant use of that time to put people to work reading scanned text that optical recognition software (OCR) had difficulty in understanding. The service, which is now employed by 40,000 web sites, uses a simple technique to get people to help in figuring out unknown scanned words. Each reCAPTCHA box presents users with two words — one that the system knows to be correct (a control word) and one that is unknown. If the user gets the control word correct, the system can assume that the other word also has a high likelihood of being correct. If enough users enter the same thing for that word, it can be used as a control word.

Of those 200 million daily CAPTCHAs, reCAPTCHA serves about 4 million, which is “the equivalent of 1500 people working full-time and transcribing 60 words per minute,” according to a report in this month’s Science. The service, which is free for web sites to use, has deciphered 440 million words for clients over the past year. According to Ars Technica

, reCAPTCHA is also very accurate. In a test that used a random sample of 250 New York Times articles from different time periods, OCR software managed just 84% accuracy on its own. When combined with reCAPTCHA, though, the accuracy rating shot up to 99.1%. That, says Ars, is comparable to professional transcription services where they employee two transcription experts whose work is verified by a third party. It’s easy to see how reCAPTCHA’s use of the crowd is far more cost effective. Further, Ars reports that software designed to crack CAPTCHA images fails on reCAPTCHA, likely because the letter distortions on scanned images are not the result of “clean mathematical transformation,” and thus are hard for a computer to correct. reCAPTCHA is a simply brilliant use of essentially wasted time, and I’m pleased to hear that it’s working. When I first wrote about the program last year for ReadWriteWeb I noted that in college one of my classes was part of a project to digitize old maritime journals. We used expensive overhead scanners and fancy OCR software, but even so most of our time was spent correcting mistakes that the software had made. The reCAPTCHA system would have been a welcome addition to our work back then.

Frequently Asked Questions about reCAPTCHA

What is reCAPTCHA and how does it work?

reCAPTCHA is a free service provided by Google that helps protect websites from spam and abuse. It uses an advanced risk analysis engine and adaptive challenges to keep automated software from engaging in harmful activities on your site. It does this while letting your valid users pass through with ease. reCAPTCHA works by presenting users with a challenge that is easy for humans to solve but difficult for bots. This could be identifying objects in images, transcribing text, or simply ticking a box that says “I’m not a robot”.

What are the benefits of using reCAPTCHA?

reCAPTCHA offers multiple benefits. It helps protect your website from spam and abuse, ensuring that only humans can access certain parts of your site. This can help prevent automated bots from posting spam comments, creating fake accounts, or performing other malicious activities. Additionally, reCAPTCHA is easy for humans to solve, ensuring that it doesn’t create an unnecessary barrier for your users.

How can I implement reCAPTCHA on my website?

To implement reCAPTCHA on your website, you’ll first need to register your site with Google’s reCAPTCHA service. Once you’ve done this, you’ll be provided with a site key and a secret key. You can then add the reCAPTCHA script to your site’s HTML, and use the site key to create the reCAPTCHA challenge. The secret key is used on your server to verify the user’s response.

Are there different versions of reCAPTCHA?

Yes, there are currently three versions of reCAPTCHA available: reCAPTCHA v1, v2, and v3. Each version offers different features and levels of security. For example, reCAPTCHA v2 requires users to solve a challenge, while reCAPTCHA v3 uses a scoring system to determine whether a user is human or a bot.

Can reCAPTCHA be bypassed by bots?

While reCAPTCHA is designed to be difficult for bots to bypass, it’s not impossible. Sophisticated bots may be able to solve the challenges presented by reCAPTCHA. However, Google is constantly updating and improving reCAPTCHA to make it more secure and effective at preventing bot activity.

Does reCAPTCHA affect user experience?

reCAPTCHA is designed to be as unobtrusive as possible to ensure a good user experience. However, some users may find the challenges annoying or difficult to solve. It’s important to balance the need for security with the need for a smooth user experience.

Is reCAPTCHA accessible to all users?

reCAPTCHA is designed to be accessible to as many users as possible. It includes features for users with visual impairments, such as audio challenges. However, some users may still struggle with the challenges, particularly if they have cognitive impairments.

Can I customize the look of reCAPTCHA on my site?

Yes, reCAPTCHA allows for some customization. You can choose between a light and a dark theme, and select the size of the reCAPTCHA box. However, the actual challenge presented to the user cannot be customized.

What happens if a user fails the reCAPTCHA challenge?

If a user fails the reCAPTCHA challenge, they will be presented with a new challenge to solve. If they continue to fail the challenges, they may be temporarily blocked from accessing the part of your site protected by reCAPTCHA.

Is reCAPTCHA available in multiple languages?

Yes, reCAPTCHA supports multiple languages. The language used for the reCAPTCHA challenge will be determined by the user’s browser settings.