SECURITY and CRYPTOGRAPHY 15-827                  29 NOV 01
             Lecture #19                                 M.B.
                                                      4615 Wean


Matthew points out that the hard AI problem on which the following CAPTCHA
is based is possibly (pretty much) solved by the folks who integrate
multiple camera views:
Q: Find the correspondence between labelled points in this picture to
labelled points in this other picture? 
A: A-1. B-5. C-4. d-2. E-3.
Given a picture and a slightly distorted variant of the picture, find
points in the distorted picture that map from given points in the original
picture. Or given two distinct views of a picture, find the point in one
that corresponds to a given point in the other. The pictures could be the
two images of a 3-D picture, or they could be two slightly different views
of a face.


CONJECTURE: An English-text-only CAPTCHA is (now 2001) impossible. 

The conjecture can and will be proved (below) under certain strong
assumptions. What is needed is to weaken the assumptions sufficiently to
convince ourselves that the conjecture is true, or else find a way to
construct an English-text CAPTCHA.

THE MODEL:
A CAPTCHA is a randomizing algorithm with access to GOOGLE that carries on
a conversation with an opponent -- human or a bot that has access to
GOOGLE. The conversation proceeds in stages, starting with stage 1. In each
stage, the CAPTCHA presents a CHALLENGE and the opponent gives a RESPONSE.
At that point, the CAPTCHA either ACCEPTS (as human), REJECTS (as bot), or
continues on to the next stage by giving a new challenge. The CAPTCHA is
required to decide in a small (eg at most 10) number of stages.

MORE DETAILS ON THE MODEL:
The CAPTCHA initially sets stage k:=1.
In STAGE k:
*CAPTCHA generates random # and then uses it
    [if k>1, it also uses its history(k-1) of conversation and
    its state(k-1) -- EXCLUSIVE of previously generated random
    #s --  if any]
 to generate public challenge(k) and private state(k).
*It awaits/gets public response(k).
*Then it uses its current history(k) = challenge(1) response(1)
                                                 :           :
                                       challenge(k) response(k)
 and current state(k) to evaluate: ACCEPT, REJECT, or continue.

We prove impossibility of an English-text-only CAPTCHA based on the following:

ASSUMPTIONS:
   1. Random numbers are used, if at all, only to create a public challenge
(a continuation of the conversation) and some small amount of private state
information.  The CAPTCHA never uses its random numbers, if any, to decide
between ACCEPT, REJECT, or continue (the conversation).

   2. At the end of stage k, the decision to ACCEPT, REJECT, or continue is
completely determined by history(k). The only purpose of state(k) is to
help the CAPTCHA decide whether to ACCEPT, REJECT, or continue *efficiently*.

   3. If, at the end of a stage, the CAPTCHA neither ACCEPTS nor REJECTS
(it believes that the opponent could still be human), then the CAPTCHA is
guaranteed to present a challenge (continuation of the conversation)
(possibly dependent on random numbers) that has a non-rejectable (i.e.
conceivably human) response.

   4. If any response is not rejected (the CAPTCHA does not reject), then a
random response has a nontrivial probability (greater than 1% say) to not
be rejected.

   5. Given history(k-1) and challenge(k), it is efficiently possible for a
bot to find a random # and a state(k-1) that causes its own private virtual
copy of the CAPTCHA to generate the same challenge(k), and therefore to
evaluate response(k).

PROOF: ........

QUESTION: How does the OCR-based CAPTCHA circumvent this Theorem? The
CAPTCHA chooses one of a very large set of random numbers to select the
challenge (a distorted image) and the state information (the actual word).
The opponent cannot guess the random number (efficiently).  It therefore
cannot simulate the generation of (a copy of) the actual challenge.