15-750 Graduate Algorithms 04/05/01 Approx algs III * Semidefinite Programming * Max-cut * Max-2SAT (handwave) ============================================================================= IMPORTANT HIGH LEVEL POINT: When we do a relaxation (whether it's an LP relaxation or an SDP relaxation), our solution will be "better than optimal". That's because we are giving ourselves more freedom, and in the process we are in a sense "smoothing out" the problem which is allowing us to solve for optimality. But, our solution isn't legal. Then we will somehow "round" the solution we have to a legal one, incurring a cost. The final statement we will make is that our solution is within some factor c of the optimal solution to the relaxation, which is at least as good as the solution to the original problem. Therefore we are within a factor of c for the original problem. All of our arguments have looked like this, but I just wanted to make this explicit..... Today: new technique called semidefinite programming. Much like how LP relaxed {0,1} values to [0,1] values, this will relax numbers to vectors (or points) in an n-dimensional space. Our optimization problems will end up looking like various kinds of clustering problems. So it helps if you can think in n-dimensional space.... We'll talk about this in the context of the MAX-CUT problem. MAX-CUT: Given a graph G, partition vertices into two sets S, T, to maximize the number of edges between S and T. E.g., if the graph was 2-colorable, then that would give you a perfect cut from this point of view. If not, then we're asking for the 2-coloring that gets the most edges correct. Can think of this as a version of MAX-2SAT, but where each clause is an XOR of variables, rather than an OR of literals. In fact, the techniques we will discuss apply to MAX-2SAT too. But MAX-CUT is a little cleaner. Here's a natural greedy algorithm: - Start with any arbitrary cut. - If some node has more neighbors on its side than on the other side, then move it to the other side. Repeat. Can you prove this won't just run forever? Each step increases the number of edges crossing the cut. At the end, at least half of the edges are crossing the cut (Why?). So, this is trivially a 1/2 approximation. How about a really simple randomized algorithm? Just put each node on a random side. Every edge has 1/2 prob of crossing cut, so expected number crossing the cut is m/2. Basically, this was the best known for a long time..... Then, [Goemans & Williamson] showed how could use semidefinite programming to do a lot better. ============================================================================= What is Semidefinite programming? Start with an operational definition (what you can do) and then look at what's under the hood. Operational definition: Semidefinite programming is like linear programming, but your variables are vectors, and what you're allowed to write down as constraints are linear inequalities on DOT-PRODUCTS of these vectors. (and you can also maximize or minimize an objective function in this form too) E.g., vectors a,b,c. Constraints: a.a = 1, b.b = 1, c.c = 1. a.b <= 0, b.c <= 0, a.c <= 0. What if we wanted to maximize a.b + b.c + c.a? What if we wanted to minimize it? Notice: we're not allowed to specify that these vectors must live in a 2-dimensional space. So, in general, their span could have as high a dimensionality as the number of vectors. Let's try to use for MAX-CUT. We'll have one variable (vector) for each node in the graph. Let's require them to be unit vectors by saying vi.vi = 1 for all i. Now, we want to put them into two clusters to maximize the number of edges between the clusters. Here's one way we can try to do that: maximize SUM 0.5*(1 - u.v) (u,v) in E E.g., if u=v then it contributes 0 to the sum. If u = -v then it contributes 0.5*2 = 1 to the sum, and if u is perpendicular to v then it contributes 1/2 to the sum. In particular, notice that if we could magically add the constraint "all vectors must lie in a 1-dimensional space" (since they have length 1, this is equivalent to saying that they can either be at +1 or at -1), then our objective function is EXACTLY EQUAL to the number of edges crossing the cut. Unfortunately, we can't. So, much like an LP relaxes {0,1} to fractional values, we are relaxing by allowing an n-dimensional space. So, the SDP might return a "better than optimal" solution according to its objective, by using its freedom. E.g., if the graph is a triangle, then the max cut has value 2. What would the SDP return? (equilateral triangle -> 9/4). The difficulty with SDPs is that we have to somehow "round" these vectors back to boolean values. For the MAX-CUT problem, here's what we'll do: - pick a random hyperplane through the origin. - Let S = set of points on one side, and let T = set of points on the other. Claim: this gives an 0.878-approximation. ========================================================================== MAX-CUT Algorithm: - set up and solve the SDP. - Split into S and T with a random hyperplane through the origin. 2 things to do now. (1) how does this SDP box really work. (2) prove the claim that this gives you an 0.878 approximation. Let's do (2). If time left, get back to (1) at the end. Proof of claim: First of all, given two vectors (u and v) that are separated by an angle alpha, what is the probability they get split by a random hyperplane? Answer: alpha/pi. Why? Important point: intersection of random hyperplane with the 2-d plane defined by u and v looks like a random line (with probability 1). So, can calculate expected value of our solution as a function of all the pairwise angles. E[size of cut] = SUM angle(u,v)/pi (u,v) in E compare this to: SUM 0.5*(1 - u.v) = OPT^* > OPT (u,v) in E So all we need to do is compare item by item. If angle is alpha, then u.v = cos(alpha). Draw graph of alpha/pi for alpha in [0,pi] and compare to 0.5(1 - cos(alpha)). What you get is: For any angle alpha in [0, pi], alpha/pi > 0.878(1 - cos(alpha))/2 So, our solution is at least 0.878 * OPT. =========================================================================== --> Talk about MAX-2SAT Rough idea. First of all, we want to distinguish between TRUE and FALSE, so lets have one more unit vector y, which will stand for TRUE. Then the main thing to notice is that say we have some clause like (x1 v x2). How does the quantity x1.y + x2.y - x1.x2 behave, if we just had the case that all the x's were restricted to the 1-dimensional line along y? If x1 and x2 are both true, we get 1+1-1 = 1 If x1 is true but x2 is false, we get 1 - 1 + 1 = 1 If x1 is false but x2 is true, we get -1 + 1 + 1 = 1 If both are false we get -1-1-1 = -3. So we can use this (after scaling) to create an appropriate SDP objective function. --> What's going on "under the hood". Under the hood: have matrix M of variables M_{ij}, where we think of M_{ij} as v_i \dot v_j. Constraints are linear in the variables. Then we require M to be positive semidefinite. This means we can decompose M into V V^t using Cholesky factorization, which gives us the vectors. The requirement that M be positive semidefinite turns out to be convex and have a separation oracle. In particular, given a proposed solution, we can try doing the Cholesky factorization and either we succeed, or else we can produce a linear constraint that is violated by M. For some LP algorithms like the Ellipsoid algorithm, this is enough.