From mblum@cs.cmu.edu Thu Sep 20 12:13:58 2001 Date: Thu, 20 Sep 2001 12:05:53 -0400 From: Manuel Blum To: hopper@cs.cmu.edu Subject: crypto lecture 3 (with partial answers to some hw problems) [ The following text is in the "iso-8859-1" character set. ] [ Your display is set for the "US-ASCII" character set. ] [ Some characters may be displayed incorrectly. ] SECURITY and CRYPTOGRAPHY 15-827 19 SEP 01 Lecture #3 M.B. M.B. 4615 Wean HANDOUT: First 36 pages of "The MEMORY BOOK" by Harry Lorayne and Jerry Lucas, Ballantine Books, ISBN:0-345-41002-5 (1974. softcover edition 1996). As you have noticed, this is not a standard class on Security and Cryptography. Let me remind you again: The goal of this class is to write a joint research paper in the area of Human Interactive Protocols (HIP). This will include Human Oriented IDentification (HumanOID) and CAPTCHA. Right now, we are still trying to get moving on HumanOID. Our first goal for HumanOID is to generate a set of instructions enabling "almost any" human that can read and write English, age 6 to 60, to "easily" construct for herself an "unbounded" source of "personal" CHALLENGE - RESPONSE pairs. ALMOST ANY HUMAN means at least 1/100 of all people who can read and write English. The actual fraction can be determined statistically. EASILY CONSTRUCT means that after an hour of practice it takes only a few minutes to construct a new challenge-response pair. UNBOUNDED means can construct at least one a day. PERSONAL means that the pairs should be unique to each human: no two humans have the same pairs except with some extremely small probability. CHALLENGE-RESPONSE means... a CHALLENGE consisting of a word, a few words, or a short sentence. a RESPONSE consisting of a sentence or "random-looking" string of 6 characters. RANDOM-LOOKING means that the string of at least 6 characters is no more probable than any one of 10^6 other possible strings. (Note: This means that the chance of an eavesdropper randomly choosing the right string is less than 1/10^6. I am not asking for 1/26^6.) The pairs should have the property that an EAVESDROPPER who overhears any subset of these pairs cannot COMPUTE the correct response to even one new challenge, except by some small chance probability, like 10^-6. EAVESDROPPER = a person who overhears or otherwise has access to a subset of challenge-response pairs. COMPUTE means that the eavesdropper has a powerful computer at his disposal: a CRAY for a day. 1000 workstations for a month. full access to the web. I am grateful to Scott CROSBY for spending an hour with me after the last class to point out two things: 1. I better get the problem a lot better defined. This is important in order to have a chance to decide whether a solution IS POSSIBLE or NOT. 2. Access to the web invalidates a great many of my own personally proposed challenge-response pairs. I am grateful to Rachel RUE for pointing out that The MEMORY BOOK (handout), and indeed memory builders in general, are an important resource for this project. The Memory Book gives methods for remembering a great many things including people's names. It can be turned to our purposes. Challenge: My Polish student who studied at ETH Switzerland. Hidden: Bartosz Przydatek -> Bar-tek, Prince of data -> Response: BARTEKPOD Challenge: Chinese student I invited to Guang Zhou and CMU. Hidden: Ke YANG -> YANG Ke -> A Connecticut Yankee in King Arthur's court. Response: ACYIKAC Challenge: My PhD student Minnesota born and raised. Hidden: Nick Hopper -> Nickles hopping on a table. Response: NHOAT Challenge: My Guatemalan PhD student Hidden: Luis von Ahn -> Don't lose your fountain Pen! Response: DLYFP Challenge: The Reverend Charles Dodgeson Hidden: Lewis Carroll -> My son's middle name. My wife's middle name. -> RESPONSE: MSMNMWMN The idea: Give a CHALLENGE that evokes HIDDEN possibly web-searchable info, then turn that into a PRECISE RESPONSE of related memorable information. PRECISE means that it can be written down in ascii and a computer can check for a simple exact correspondence. EXAMPLES: I asked my wife to give me 3 responses to the following 3 challenges: Challenge: Your father Response: Coney Island Challenge: Your mother Response: Orchid Challenge: Your sister Response: Kosovo It's not clear to me just how long she'll be able to remember it, but I'll find out. I've made notes in my calendar to ask for her responses in a day, a week, a month, a year and 10 years from now. The purpose in this is twofold: to check how well she does and to solidify her memory. I asked my 84 year old mother to give the nicknames of her mother's (my grandomother's) 10 siblings, and a tidbit for each. No problem. For example, CHALLENGE: Your mother's sister Gusta. HIDDEN: I was the flower girl at her wedding. RESPONSE: flower girl CHALLENGE: Your mother's sister Klara. HIDDEN: Her husband was the doctor in his town. He never charged relatives. RESPONSE: No charge. CHALLENGE: Your mother's brother Moishe. HIDDEN: He was the best (most generous) of them. If I needed anything, I would go to him. RESPONSE: The best. We are still far from a "virtually infinite" number of challenge-response pairs. HOMEWORK SET #2 (To be turned in to Nick Hopper hopper@cs before class next Tuesday.) HW 2.1 Give 10 challenge-response pairs. Each response should have at least 6 random-looking characters. (Definition of random - looking has been given above.) You should store these pairs and check your memory in a day, a week, and a month from now. At any moment in time in this course, you may be asked to reply to 1 or more of your random challenges. HW 2.2 Suggest a virtually infinite source of personal challenge - response pairs. For both problems below, give an "excellent approximation" that works well in the limit and also works well for small numbers. HW 2.3 (BIRTHDAY PARADOX). In a world with d days per year, what is the probability pr that no two people in a class of p people have the same birthday? * Give an exact formula * Substitute p = sqrt(d) and show that in the limit as d -> infinity, the correct answer is quite pretty: * Give an approximation that is easy to compute on a calculator for very large d yet works "well" also for small numbers, like d=10.) HW 2.4 (COUPON COLLECTOR'S PROBLEM) Give an exact or very good approximate solution to this problem (see its statement below) that you can use on a simple calculator. Your solution CC(n) should be correct in the limit as n -> infinity, in the sense that the ratio of your approximation to the actual value of CC(n) should go to 1. In addition, your approximation should give good results for small n, like n=10. Recall definitions from last lecture: BIRTHDAY PARADOX QUESTION: In a world with n days per year, how many people should one invite to a party so that there is a roughly 50% chance that at least 2 people have the same birthday? ANSWER: (1.2)*sqrt(n) COUPON COLLECTORS PROBLEM QUESTION: A cereal box contains one of n coupons, each coupon chosen uniformly at random (i.e. each coupon is equally likely to appear in a box). How many cereal boxes should one expect to buy in order to get all n coupons? ANSWER: approximately n*(lg n). QUESTION: What base? In a class of 35 students, the probab that all 35 have different birthdays is 365*...*(365-35+1) --- -------- = .185 365 365 In a class of 40, it is approx = .108