15-750 Graduate Algorithms * Admin * Quickly though some basic alg topics: - Karatsuba, Strassen, recurrences - RVs, linearity of expectation, - randomized quicksort 1 handout: 15451 lects 1-5 ============================================================================== Admin: - Web page: see link on my homepage, or use: /afs/cs/academic/class/15750-s01/www/index.html. Schedule of topics & dates isn't up there yet but will be soon. - Using Kozen book. - Expect to have hwks every 2 weeks, a take-home midterm, and an in-class final. 50%/20%/30% (see web page). - Hwks will be be combination of problems to do by yourself andproblems you can work in groups on. - Hwks will be graded by teams of students. Each student is expected to do this once. We are assuming everyone has had some kind of undergrad theory/algs exposure. Divide & conquer, dynamic programming, asymptotic notation, NP-completeness. Will do a quick refresher today. Handing out 1st 5 lectures of 15-451 which has some of basics -- you're responsible for this material even though we are only going to skim it very quickly. ====================================== Let's start with a problem we are all familiar with: multiplying two numbers. Say we want to multiply two n-bit numbers. 41 x 42 (101001 x 101010). Let's call them X and Y. Really slow method: add X to itself Y times. How many additions does this take as a function of the *size* of the input? O(2^n). Not good. Point to make here: one might naively say this is linear time because it is linear in Y, but what we care about is running time as a fn of the *size* of the input, and it only takes log(Y) bits to write Y down. What about the standard method -- how long does that take? 1010010 101001 101001 ----------- 11010111010 = 1722 Ans: O(n^2). Can we do better? Let's try to do some kind of divide & conquer. Break X and Y up into their 1st and 2nd halves: --------------- X = A*2^{n/2} + B | A | B | +-------------+ Y = C*2^{n/2} + D | C | D | --------------- We can write X*Y as AC*2^n + BC*2^{n/2} + AD*2^{n/2} + BD If we use this directly as a recursive program to multiply two numbers, we get the recurrence: T(n) = 4*T(n/2) + const*n This solves to O(n^2). So, no gain. But, we can rewrite the computation like this: AC(2^n + 2^{n/2}) - 2^{n/2}(A - B)*(C - D) + (2^{n/2} + 1)BD | \ / / | \ / | | \ / / | \ / | AC*2^n cancel BC*2^n/2 AD*2^n/2 cancel BD So we have 3 multiplies of size n/2, 2 adds of size n/2, 3 adds of size n, and a couple of shifts. This gives us the recurrence: T(n) = 3*T(n/2) + const*n. which solves to O(n^{log_2(3)}) = O(n^1.585) (first discovered by Anatoli Karatsuba, in Russia, 1962.) Is this asymptotically best possible? Turns out can use FFT to do faster. Karp: O(n log^2 n), then improved to O(n log n loglog n) by Schonhage and Strassen. This is best known. How many have seen Strassen's alg for matrix multiplication? Strassen is natural generalization to the problem of multiplying two matrices. [Will now assume all entries are "small" so adding or multiplying any given pair of entries takes constant time.] To multiply two nxn matrices in the usual way takes time O(n^3). If we think of each nxn matrix as consisting of 4 n/2 by n/2 submatrices, then the standard method can be thought of as performing 8 n/2 by n/2 multiplications and 4 additions. Strassen cleverly rearranges to use only 7 multiplications (and 14 additions). This gives a recurrence of T(n) = 7T(n/2) + cn^2 which solves to O(n^{2.81}). Important because do this a lot in scientific computation. Strassen's has more overhead than standard method, but on Cray, switchover is at 64x64 I think. Asymptotically, best matrix multiply alg known is by Coppersmith and Winograd and has time O(n^{2.376}), but not practical. Nobody knows if can do better --- FFT approach doesn't seem to carry over. =========== Basic probability. Say I have a deck of 52 cards in order, and shuffle randomly. How many cards do I expect to end in the same place as they started? Ans 1. A Random Variable is a function from outcomes (of some probabilistic experiment) to reals. E.g., "number of cards that end in correct position" is an RV. We'll only care about discrete RVs. A couple of ways of looking at *expected value* of an RV: E[X] = sum_a a*Pr(X=a). [The a's are different values X can take on] or E[X] = sum_e Pr(e)*X(e). [The e's are the outcomes (elementary events)] A REALLY USEFUL FACT is linearity of expectation. If X and Y are two RVs, then even if they are highly correlated, E[X+Y] = E[X]+E[Y]. Proof: direct from definition #2. Ifwe define Z(e) = X(e)+Y(e), then it's immediate from the definition that E[Z] = E[X] + E[Y]. We can use this in the cards example by defining RVs like this: X_i = 1 if the ith card ends in the same place that *it* started, and X_i = 0 otherwise. X = # cards that end in same place as they start. E[X_i] = Pr(ith card ends in same place as it starts) = 1/52. E[X] = E[X_1 + ... + X_52] = E[X_1] +...+E[X_52] = 1. Now, let's use this to analyze randomized quicksort. Quicksort: 1) pick an element as pivot^* 2) split array into LESS, EQUAL, and GREATER by comparing each element to the pivot. 3) recursively sort LESS and GREATER *version 1: use leftmost element as pivot. What is worst-case time? Omega(n^2) if array is already sorted. That's because everything goes into GREATER and we get a recurrence like T(n) = n + T(n-1). What if we make the algorithm itself be probabilistic? RANDOMIZED-QUICKSORT: pick a *random* element as pivot. We'll prove: for any initial ordering ofthe input, expected time to sort is O(n log n). This is sometimes called a Worst-case Expected-Time bound. This implies that the deterministic algorithm gets O(n log n) *on average* (meaning, on a randomly ordered input). But, having a good running time "on average" is really not as satisfying since it depends on what your inputs end up looking like. It's a little funny: making the algorithm probabilistic gives us *more* control. A lot like why you use randomization in a 2-player matrix game like rock-paper-scissors. There's a neat way to analyze randomized quicksort that is a lot like what we did with the card shuffling. THEOREM: E[# comparisons] <= 2n*ln(n) PROOF: First of all, let's assume all elements in array are different since it's the worst case and will make our notation simpler. What we'll analyze is the expected number of comparisons made by the algorithm, since that determines the running time. Just like above, the trick is writing that as a sum of simple Random Variables, and using Linearity of Expectation. Define X_ij = 1 if we *did* directly compare the ith smallest and jth smallest elements in the sorting process = 0 if we *didn't*. n-1 n n-1 n * So, how can we write X? X = SUM SUM X_ij. So, E[X] = SUM SUM E[X_ij] i=1 j=i+1 i=1 j=i+1 - denote ith smallest by e_i, jth smallest by e_j, and conceptually, imagine lining the elements up in sorted order. If pivot we choose is between e_i and e_j then these two end up in different buckets and we *dont* compare them. If pick e_i or e_j as pivot then we *do* compare them. If pick pivot < e_i or > e_j then both end up in same bucket and we repeat. So, can think of as a dart game: we throw a dart at random, and if outside the range we repeat, until we hit e_i or e_j or something in-between at which point we stop. Pr(we end up hitting e_i or e_j) = 2/(j-i+1). This is E[X_ij]. - So, have n-1 n n-1 n-i+1 n-1 n SUM SUM 2/(j-i+1) = 2 SUM SUM 1/d < 2*SUM SUM 1/d. i=1 j=i+1 i=1 d=2 i=1 d=2 - Now, 1 + 1/2 + 1/3 + ... + 1/n is called the "nth harmonic number". Denoted H_n. H_n is in the range [ln(n), ln(n)+1], which you can see by thinking of it as discrete version of the integral of 1/x. n-1 - So, our total is < 2*SUM (H_n - 1) = 2*(n-1)*(H_n - 1) < 2n*ln(n). i=1