Markov Chains Lecture 2. D. Sleator Reminder of last time. Markov chains variation distance convergence rate The general goal of this work is to efficiently estimate the number of solutions to combinatorial problems. Theory of #P completeness. Some #P-complete problems: Computing the permanent Counting the number of colorings of a graph Let G be a graph of n vertices and m edges. Let Delta be the maximum degree of G, let k be the number of colors we want to color G with. Randomized Approximation Scheme for k-colorings is: A randomized algorithm that takes as input a graph G and error bound e. And produces as output a number Y such that: Pr((1-e)C(G) <= Y <= (1+e)C(G)) >= 3/4 Here C(G) is the number of k-colorings of the graph G. The scheme is Fully Polynomial if it runs in polynomial time in the input length and e^(-1). Call it "FPRAS". Our goal is to develope an FPRAS for k-coloring a graph. We'll be able to do it as long as k >= 2Delta+1. Our approach is this: (1) Give an efficient algorithm to generate a uniformly random coloring. (2) Use this random coloring algorithm to compute the estimate Y of the number of colorings. In this lecture I'll talk about (1) in detail, but just gloss over part (2). Let's do that first. Let G be the graph in question. Let m be the number of edges in G. Consider a series of graphs G=G(m), G(m-1), G(m-2)... G(0) where each one is obtained from the previous one by removing some edge. Let C(G) be the number of k-colorings fo a graph G. Our goal is to estimate C(G). We can write C(G) as follows: C(G(m)) C(G(m-1) C(G(1)) C(G) = --------- x -------- x .... x ------- x C(G(0)) C(G(m-1)) C(G(m-2) C(G(0)) Our technique will be to estimate each of these terms. Clearly C(G(0)) = k^n. To estimate: C(G(i) --------- C(G(i-1)) Note that the set of colorings of G(i) is a subset of the colorings of G(i-1). So the ratio we want to estimate is simply the probability that a random coloring of G(i-1) is also a coloring of G(i). It remains to carefully work out the convergence rates so that we know how many times we have to sample in order to get an estimate Y of C(G) that's sufficiently close. This is all done in Jerrum's paper. The key thing that we need is to show that the algorithm presented last time for coloring converges quickly. Lemma 1: The convergence time of the coloring markov chain is k Tau(e) <= -------- n ln(n/e) k-2Delta We're going to prove this by using a technique called coupling. Let me explain that first. Let Xt and Yt be the states at time t of two identical markov chains. Each chain, when viewed separately from the other, is perfectly random. However the chains are corrolated in such a way that they tend to converge to the same state. Once this event happens (they've coupled) they remain coupled. Let me give an example. Consider the random walk on a circle of 2n points. At each step we either go clockwise, counterclockwise, or stay in the same state, with probability 1/3. At each step, chain X decides randomly to go make one of three moves (c, cc, s). Chain Y decides what to do, based on the current relationship between X and Y, and what X decides to do. Specifically: If Xt=Yt then Y does what X does If Xt and Yt have the same parity then we map X's move to Y's move as follows (c,cc,s) ==> (cc,c,s) If Xt and Yt have different parity then we do this map: (c,cc,s) ==> (c,s,cc) It's clear that viewed separately from X, Y looks like a perfectly good implementation of a random walk on a circle. It's also clear that the two will converge fairly quickly. Why? Well, because * If they have opposite parities, then in expected 3/2 steps they'll have the same parity. * If they have the same parity, then the time it takes until coupling to occur is the time it takes for a random walk on a line of n points to reach the end. It turns out that there's a very direct connection between the variation distance and coupling. Specifically: Lemma: Let (X,Y) be two coupled markov chains. Let Y's initial state be chosen from the stationary distribution. Then: variation distance of X at time t <= Pr(X and Y have not coupled by time t) Proof: Recall that the variation distance is defined as: D(Xt) = MAX P(Xt in S) - Pi(S) S subset of Omega Alternatively, it's the sum over all states S where the probability of Xt being in that state is larger than the stationary distribution Pi, of the difference between them. Let A be this set maximizing this. D(Xt) = Pr(Xt in A) - Pr(Yt in A) Now, there are four disjoint possibilities. Let p1 = Pr(Xt in A and Yt in A) p2 = Pr(Xt in A and Yt not in A) p3 = Pr(Xt not in A and Yt in A) p4 = Pr(Xt not in A and Yt not in A) Rewriting the above: D(Xt) = p1+p2 - (p1+p3) = p2 - p3 <= p2 <= Pr(Xt != Yt) This is just the probability that X and Y have not coupled. QED. Now returning to the random walk on the circle, we need to bound the probability that X and Y have not coupled. (Ignoring parity) This probability is just the probability that a random walk on a line of n points reaches the endpoint. This can be bounded using expectations. I won't do it here. Proof of Lemma 1. We construct a coupling (X,Y), where X is the original coloring markov chain. Here's how it works: (1) Select a vertex v in V at random. (2) compute a permutation g = g(G,Xt,Yt,v) of C (the set of colors) by a procedure to be explained below. (3) Choose a color c in C at random. (4) Recolor v to be color c in Xt to give X' Recolor v to be color g(c) in Yt to give Y' (5) If the coloring X' is proper then X(t+1) = X' otherwise X(t+1)= Xt. (same for Y) Clearly, both X and Y are faithful copies of the coloring markov chain. We need to explain how we choose g(). Let A = the set of vertices on which Xt and Yt agree .. D = ...................................... disagree Consider E', the set of edges of G where one end is in A and the other end is in D. Let d'(v) be the degree of v in this set E'. Let m' = |E'|. Here's how we compute g(). (a) if v in D then g is the identity (b) if v is in A: Let N be the neighbors of v. Define Cx to be the set of colors used in N in Xt but not in N in Yt. Define Cy to be the set of colors used in N in Yt but not in N in Xt. Cx and Cy are disjoint. |Cx| <= d'(v) and |Cy| < d'(v) Why? Because: look at the subset of N that is in A. The colors used there are not in Cx or Cy. The rest of N has size d'(v). WLOG assume |Cx| <= |Cy|. Choose any subset C'y of Cy that is the same size as Cx. The permutation used simply maps colors from Cx <----> C'y, and is the identity elsewhere. Let |D(t)| be the size of the disagreeing set at time t. When this reaches 0, the chains have coupled. We need to bound the probability that this has happened after t steps. Delta D(t) is -1, 0 or 1. Let's look at the probability that it goes up by one. This is bad and we want the probability to be small. The only way this can happen is if the recolored vertices agree and are recolored to disagree. This might occur in case (b) above. It turns out that the only way this can happen is if the choice of color c is in Cy. [Why? Well, you can see this just by running through all the possibilities. Actually, you kind of need to split out the WLOG part to make it rigorous.] So: 1 d'(v) m' Pr(D increases) <= --- Sum ----- = --- n v in A k kn What about D decreasing? Well this will happen if v has different colors in Xt and Yt, and it ends up with the same color. This will certainly happen when the chosen color differs from all the colors among N in both Xt and Yt. How many different colors is that? It's clearly at least k-2d(v)+d'(v) >= k - 2Delta +d'(v). Thus: 1 k - 2Delta +d'(v) (k-2Delta) Prob(D decreases) >= --- Sum ------------------ = ---------- |D|+m'/kn n v in D k kn Let's look at the expected size of D(t+1). E[|D(t+1)|] = Prob(D increases)(|D(t)|+1) + Prob(D decreases)(|D(t)|-1) Prob(D stays the same)(|D(t)|) E[|D(t+1)|] = Prob(D increases)(|D(t)|+1) + Prob(D decreases)(|D(t)|-1) (1-Prob(D increases) - Prob(D decreases))(|D(t)|) E[|D(t+1)|] = |Dt| + Prob(D increases) - Prob(D decreases) Define k-2Delta m' a = -------- b = --- kn kn Using this notation: Prob(D increases) <= b Prob(D decreases) >= a|D(t)| + b E[|D(t+1)|] <= |D(t)| + b - (a|D(t)|+b) E[|D(t+1)|] <= |D(t)| + b - (a|D(t)|+b) E[|D(t+1)|] <= (1-a)|D(t)| ...See last paragraph of proof...