15-451 Algorithms	11/16/00
* Number-theoretic algs
   - GCD
   - fast exponentiation
   - Euler's theorem
   - RSA
==============================================================================
Today and next time will talk about number-theoretic algorithms. 
A collection of uses.  One of most important is in cryptography, RSA.

Hard and Easy problems:
======================
FACTORING: Given number n, find a number d (1 < d < n) that divides n if
one exists.  No known fast algorithm (for non-quantum computers anyway).

Note: goal is polynomial time in log(n).  I.e., in the number
of bits it takes to write down the input.  Easy to factor in time
linear in n.  Even sqrt(n) by trial division.  Quadratic sieve alg is
roughly 2^{sqrt(log n loglog n)}.  Number-field sieve is 2^{O((log
n)^{1/3}(log log n)^{2/3})}.  Largest RSA challenge number factored so
far is 512 bits (155 digits).  

    Number Factored  Size in Bits  Date Factored  Algorithm Used
    ===============  ============  =============  ==============
      RSA-100            330        April 1991         QS
      RSA-110            364        April 1992         QS
      RSA-120            397        June 1993          QS
      RSA-129            425        April 1994         QS
      RSA-130            430        April 1996         NFS
      RSA-140            463        February 1999      NFS
      RSA-155            512        August 1999        NFS
(source: certicom.com and rsa.com)

PRIMALITY TESTING: Given number n, is it prime or composite?  This
*can* be done quickly by a randomized algorithm: if n is prime then
for sure will say prime, but if n is composite, it is only guaranteed
to say composite with prob 1 - 1/2^{100}. 

More easy problems:
 - inverse mod n:  given a,n compute a^{-1} mod n.
 - GCD: given a,b compute GCD(a,b)
 - exponentiation: given a,b,n compute a^b mod n.

Obvious fact: COMPOSITENESS (is n composite) is in NP. Proof is a
factor between 1 and n.

Nonobvious fact: PRIMALITY (is n prime) is also in NP.  I.e., there
exist short proofs that a number is prime.  We'll get to this later.

This is why factoring is believed to *not* be NP-complete. Decision
problem for factoring is: "given n and k, does n have a divisor
between 1 and k".  This is clearly in NP: proof is the divisor.  But,
it is also in NP: proof is the prime factorization of n, along with a
proof that each of these factors is prime.  Nonetheless, still no fast
algorithm known.

Cryptosystems like RSA are based on fact that primality testing and
other easy problems above are easy, but factoring seems hard.  Will
get to RSA later once we've built up some of foundations.  Start with
some of things we *can* do:

GCD
===
GCD(a,b) is the largest d s.t. d divides a, and d divides b.
I.e., both a and b are multiples of d.  gcd(a,0)=a.

E.g., what's GCD of 51 and 9?

Can we compute GCD(a,b) quickly - in time poly(log(a) + log(b))?  Yes.
Classic algorithm over 2000 years old called Euclid's alg.  Based on
observation that if a>b, then GCD(a,b) = GCD(b, a mod b).

	Note: I'm using "a mod b" to mean smallest nonnegative of form
	a-kb for integer k.

Proof: if d|a (so a = a'*d) and d|b (so b = b'*d) then a-kb = d(a' - kb'),
so d|(a mod b) too.  Similarly, if d|b and d|a-kb then d|a.

So, this is the algorithm
GCD(a,b)
{
 if (b==0) return a;
 else return GCD (b, a % b);
}

E.g., GCD(51, 9) = GCD(9,6) = GCD(6,3) = GCD(3,0) = 3.

Can anyone see quick argument that running time is O(log a)?  E.g.,
looking 2 steps at a time?  [If b < a/2 then after just 1 step the
largest is cut in half. If b > a/2 then a mod b < a/2 so in 2 steps
the largest is cut in half]

Extended GCD: also computes integers x and y such that d = ax + by.

Idea: can compute with same algorithm.  Recursively, running on b and
a-kb, we compute x', y',d such that d = bx' + (a-kb)y'.  This means that 
d = ay' + b(x'-ky').  This seems like a curiosity but we'll see how this
will be really useful.

Now, turn to modular arithmetic.

Definitions:

Z_n = {0,1,2,...,n-1}
Z_n^* = {a in Z_n : gcd(a,n) = 1}.  If n is prime, then Z_n^* = {1,2,...,n-1}
Z_n^* is a group under multiplication: if you multiply two numbers
relatively prime to n, you get another number relatively prime to n.
Also, each a in Z_n^* has an inverse: a number b such that ab = 1 mod
n. Question: why is this true and how can we compute inverse quickly?
E.g., what is 5^{-1} mod 17?

Here's how: compute extended GCD of n,a.  Get x,y such that nx+ay=1.
So, ay = 1 mod n: y is the inverse of a.

E.g., EGCD(17,5) calls EGCD(5,2) where 2 = 17-3*5.  This returns x'= 1, y'
= -2. So, our answer is x = -2, y = 1 - 3*(-2) = 7.  So, 5^{-1} = 7 mod 17. 

Fast Exponentiation
===================

Given a, d, n, can we compute a^d mod n in time poly(log(n), log(d))?

Answer: yes, by repeated doubling. 
Nice way:  Call the routine below with x=1
power(a,d,x) \\ invariant is result = (a^d)*x mod n
{
 if (d==0) return x;
 if d is odd, return power(a,d-1,x*a % n);
 if d is even, return power(a^2 % n, d/2, x);
}

Every two steps, d is cut by factor of 2, so # iterations is O(log d).

The Euler Phi Function
======================

Defn: phi(n) = |Z_n^*|.

E.g., if p is prime, then phi(p) = p-1.

If n=p*q, what is phi(n)?  phi(n) = (p-1)(q-1).  Can see this by
listing the numbers {1,...,n} and then removing the q multiples of p,
then removing the p multiples of q, and then realizing that we removed
n twice.

	As an aside, in general, if n= p1^e1 * p2^e2 *...* pk^ek, then:
	phi(n) = (p1 - 1)*p1^{e1 - 1}*...*(pk - 1)*pk^{ek - 1}


Euler's Theorem: for any a in Z_n^*, a^{phi(n)} = 1 (mod n).

Easier fact: "Fermat's little theorem": a^{p-1} = 1 mod p, when p is
prime, a in Z_p^*.  Proof of Fermat: we'll prove a^p = a (mod p) and do it
by induction on a.  OK for a=1. 
(a+1)^p = a^p + p*a^{p-1} + (p choose 2)a^{p-1} + ... + (p choose p-1)a + 1
=  a^p + 1 mod p.  By induction, a^p = a mod p, so (a+1)^p = a+1 mod p.

Proof of Euler's theorem
------------------------
Definition:  order(a) = smallest t such that a^t = 1 (mod n). 
Note that {1, a, a^2, ..., a^{t-1}} is a subgroup of Z_n^*: it's
closed under multiplication and inverses.

Now we can use a basic fact from group theory that THE SIZE OF ANY
SUBGROUP DIVIDES THE SIZE OF THE GROUP.  
	Proof: Say H is a subgroup of G and y is not in H. Then the coset
	yH = {yh : h in H} is a set of size |H| (if y*h1 = y*h2 then
	h1 = h2) and is disjoint from H (if y*h1 = h2 then y =
	h2*h2^{-1}, which is in H by H's group closure properties).
	Furthermore, all cosets are disjoint (if z*h1 = y*h2 then z =
	y*h3 for some h3 in H).

So, if t is the order of a, then phi(n) = b*t for some b, and
a^{phi(n)} = (a^t)^b = 1 (mod n).