15-750 Graduate Algorithms 4/10/01 * random walks on graphs I ============================================================================ Turn to new topic: random walks on graphs. Used in lots of places: - estimation problems: estimating permanent, volumes - random selection: pick a random consistent linear separator. - derandomizing randomized algs. Plus, it's a natural topic by itself. Say you're lost in a maze and don't want to make a map. How long will it take to get out if you just move around randomly? Random walk on a graph: Start at some initial vertex. Pick neighbor at random and go there, etc. Equivalently, think of as currently on some edge heading in some direction. Then move to random neighboring edge in forward direction. Interesting quantities to look at: H_uv = "hitting time" from u to v: E[#steps to first reach v | start at u] C_uv = "commute time: E[#steps to go u...v...u] C_uv = H_uv + H_vu by linearity of expectation (X = # steps to first hit v. Y = # steps to hit u after first hit v.) C_u = "cover time from u" is the expected time to visit entire graph when we start at u. C_G = "cover time of G" = max_u ( C_u ) Stationary distribution: what is the stationary distrib of the random walk? Mixing rate: how fast to get close to SD? Today will look at all these except mixing rate. Next time will look at what kinds of graphs have "rapid mixing", in particular when graph has expl # of nodes - this is what's used for estimation problems. Todays MAIN THEOREM: If G is connected, n vertices, m edges, then C_G<= 2m(n-1). We will prove this in two ways. Implication: s-t connectivity in "randomized log space". Only have O(log n) bits of read/write memory, and required to halt in poly time, can determine connectivity with 1-sided error. ------------------------------------------------------------------------------ Method 1: First, for convenience, let's think of ourselves as at each time step being on some edge heading in some direction (as opposed to being at a node). In fact, suppose we start by picking an edge and direction at random. (I.e., our distribution has prob 1/2m on each directed edge. What is our distribution after one step? Answer: the same. Pr( we're now on edge v-->w) = Pr( we were on an edge u-->v) * Pr( took v-->w | we were on u-->v) = [d(v)/2m]*[1/d(v)] = 1/2m So this is a fixed-point, also called the "stationary distribution" of the random walk. (Not hard to show it's the only one). [Aside: in fact, no matter how we start, so long as graph is connected and has an odd cycle, we'll approach this distribution. If no odd cycles then alternate b/w odd times t and even times t] LEMMA: average time between successive visits to directed edge u-->v = 2m. Proof: Say we start in stationary distribution. We know that over a sequence of t steps, the expected number of times we traverse edge u-->v is t/2m. What we want to do is somehow invert the this to say that the expected time between two consecutive traversals of u-->v is 2m. .........u->v..............u->v.......u->v....................u->v Want to say: "in t steps expected # u->v's is t/2m so average gap must be 2m". This is like the coin flip problem where we wanted to go from "coin of bias p" to "expect 1/p flips until we see a heads", except our coin tosses (are we currently on u-->v?) aren't independent. In fact, we need to be a little careful: eg, if graph was two disconnected equal-size pieces, then the expected time between consecutive traversals of u->v would be m (although expected time to the first visit would be infinite). The way we'll argue this formally is to look at the gaps. X_1 = length of 1st gap (say we start on u-->v), X_2 = length of 2nd gap, etc. These *are* independent RVs. They also have finite variance [This is not hard to argue - just to be really weak we can say there's at least a 1/2^n chance that we'll visit u->v in the next n steps]. This means that by the law of large numbers, as s->infinity, 1/s ( X_1 + X_2 + ... + X_s) goes to E[X_1]. We now use the fact that the graph is connected to say that as our number of steps t goes to infinity, the number of gaps s goes to infinity too (with probability 1). Also, notice that the LHS of the above is equal to t/s. So, t/s goes to E[gap length]. I.e., as t gets large, whp t/s is very close to E[gap length] This means that whp s/t is near to 1/E[gap length] But we know that E[s/t] = 1/2m. So we have this random variable s/t that is bounded (in the range [0,1]) and with expectation 1/2m and that whp goes to 1/E[gap length]. This means that 1/E[gap length] is actually the expected value of s/t, and so E[gap length] = 2m. ...there ought to be a simpler way to do the above... Where are we now: we have that the average time between successive traversals of some edge u-->v is 2m. --------------------------------------------------------------------- To finish the proof about the cover time of graph: Let's consider some spanning tree of the graph. Has n-1 edges. Consider some fixed tour on the spanning tree. --> traverses each edge twice. So, E[time to visit nodes in that order] = sum, over all directed edges (u,v) in the tour, of H_uv. For each {(u,v), (v,u)}, get H_uv + H_vu = C_uv. Now, our lemma above implies that if u and v are neighbors, then C_vu <= 2m. That's because the expected time to go from v to u to v <= expected time if we wait until we actually take the u-->v edge. Since n-1 edges, we get: 2m(n-1). =========================================================================== Proof 2. Something different: resistors. Resistive network is a graph with resistors on the edges. E.g., *-----2 Ohms------* | | 1 Ohm 2 Ohms | | *-----3 Ohms------* Say we hook up a battery. Then each node has a voltage ("potential") and each edge has a current. Kirchoff's law: current is like water flow. For any node, sum entering equal sum leaving. Ohm's law: V=IR. (V = voltage difference across resistor). Voltage is a property of a node. Sometimes called "potential". Two resistors in series add resistance: *-------R1--------*-------R2-------* V0 V1 V2 Say I = current flowing from left to right. Then V1 = V_0 - I*R_1. V2 = V0 - I*R1 - I*R2 ==> delta(V) = I(R1 + R2). _____R1______ / \ Two resistors in parallel: V0* *V1 \_____R2______/ 1/R = 1/R1 + 1/R2. See via: I1 = (V0 - V1)/R1, I2 = (V0-V1)/R2. and I=I1+I2. THEOREM: Take a graph G and think of each edge as a unit resistor. Let R_uv = effective resistance between u and v. Then: Commute time C_uv = 2m R_uv. I.e., put in 2m units of current at u, pull out at v. Then: C_uv = voltage drop. Lemma 1: Say we inject d(x) units of current into each node x not equal to v, and remove 2m - d(v) units from v. [Note: is this legal? Batteries set specific voltage, not specific current. But, can imagine slowly increasing voltages at each x, until get desired current. OK since negative feedback: increase voltage at x, only decreases current from y] Then, Hitting time H_xv = V_x - V_v. Lemma 2: Say we inject 2m-d(u) amps into u and remove d(x) from each x. Then: Hitting time H_xu = V_u - V_x. Lemma 2 follows from Lemma 1 by negation. THEOREM follows from Lemma 1 and Lemma 2: if we add the two experiments together (the two sets of voltages), then currents add too and we get the experiment of putting 2m units into u and pulling them out at v. Voltage drop = (V_u - V_v) in expt1 + (V_u - V_v) in expt2, which is H_uv + H_vu = C_uv. Proof of Lemma 1: Clearly true for x=v. Now lets consider x != v. Let's define voltage at v to be 0. Let's look at neighbors of x. Current out of x = d(x). Also, for each neighbor w, current on edge x->w equals (V_x - V_w)/1. So: d(x) = SUM [V_x - V_w] (N(x) = neighbors of x) w in N(x) We can rewrite this as: V_x = 1 + AVG V_w w in N(x) Now, let's look at H_xv. We can write H_xv in terms of what happens after one step, like this: H_xv = 1 + AVG H_wv w in N(x) So, {H_xv} and {V_x} are solutions to the same set of equations. Now, this set of equations has a single solution once we fix V_v = 0 (or H_vv = 0). Can see this by noticing that if you increase the value at one node, then either some neighbor must increase by even more, or else all neighbors must increase by exactly this amount.