CAPTCHAs, those pesky challenge-response tests that many web sites use to determine whether you are human or a spambot, are an annoyance to many users. According to a report in Science (subscription required), users now solve about 100 million CAPTCHAs a day. ReCAPTCHA, a project based at Carnegie Mellon University, has found an ingenious way to harness all this work and, according to the findings published in Science this week, CAPTCHAs could be used to transcribe printed texts at the rate of 160 books a day.
The current implementation of reCAPTCHA is being used by over 40,000 web sites. The basic idea behind reCAPTCHA is that optical character recognition (OCR), even though it is constantly improving, is still unable to cope with texts where the print has faded or a page is slightly damaged. While humans can transcribe a text with about 99% accuracy, OCR software often doesn’t get beyond 80% when dealing with a slightly damaged text.
reCAPTCHA combines traditional OCR with an approach similar to Amazon’s Mechanical Turk. Every text is analyzed by two different OCR programs and whenever those two program disagree on a word, it is marked as ‘suspicious.’ Those suspicious words are then fed into reCAPTCHA, which creates a CAPTCHA with both the suspicious word and a known control word. Once a certain number of users have solved the suspicious word with the same result, it becomes a control word itself.
Overall, reCAPTCHA achieves an accuracy of 99.1%, which is on par with the accuracy achieved by having two humans type the text and then verify the results.
While it is mostly a proof of concept right now, reCAPTCHA’s developers calculate that the system can be used to transcribe the equivalent of 160 books a day.
The most fascinating aspect of this idea is that it turns mental energy, which would otherwise be wasted, into something useful. Other projects like fold.it, which turns protein folding into a game, or Google’s Image Labeler take a similar approach, but the user has to actively decide to play a game. reCAPTCHA, on the other hand, turns a chore into a useful project.