21

Many, if not all, of you by now should be familiar with the term Captcha. If not, here’s a short Wikipedia entry describing what a Captcha is:
A CAPTCHA (IPA: /?kæpt??/) is a type of challenge-response test used in computing to determine whether the user is human. “CAPTCHA” is a contrived acronym for “Completely Automated Public Turing test to tell Computers and Humans Apart”, trademarked by Carnegie Mellon University. A CAPTCHA involves one computer (a server) which asks a user to complete a test. While the computer is able to generate and grade the test, it is not able to solve the test on its own. Because computers are unable to solve the CAPTCHA, any user entering a correct solution is presumed to be human.
To me, the idea of a Captcha was brilliant. That was, until the other day when I saw a news report on BBC about reCaptcha, Luis von Ahn’s new path from the original Captcha format. Instead of requiring visitors to enter random strings, the strings now represent real printed documents (basically, turning every Web site visitor into a human OCR). I am so blown away by this technology that I am glad to add it to my Web site, and will be rolling it out to Justine’s Fresh and Tasty Design blog, the OSNAP.net blog, and of course the Magstand Web site.
You’re welcome to read the reCaptcha Web site, or the press release from Carnegie Mellon, but in basic terms the reCaptcha project turns anti-spam forms into digitizing machines. Word by word, Web participants will be digitizing thousands of pages of printed volumes that automated computer software has been yet unable to accomplish.
Just to illustrate, compare these two graphics:

This is a “CAPTCHA” image, generated by a computer in order to test the “human-ness” of a user. The string of text is randomly generated, and once the generating server is satisfied with a users response, the image is destroyed (along with the answer). Compare that with a reCaptcha:

This is a reCaptcha sequence. The two words (strings) in the white area represent two real scanned words from a printed source. If the user enters a satisfying response in the “type the two words” box, the user is granted whatever permission the form controls, and the image and response are saved in a repository, thus assisting in the digitization of the printed source.
I am dwarfed by such brilliance.






Categories