15-451 Algorithms 11/07/00 * 3-SAT & CIRCUIT-SAT * NP-completeness: informal * NP-completeness: formal * More reductions Quiz Thurs: MST, Dijkstra, Network flow, Linear Programming ============================================================================ In the last few classes, we've looked at increasingly more expressive problems: network flow, min cost max flow, linear programming. Informally, "expressive" means you can code up a lot of different problems in their "language". So, by solving these well, we end up having some important hammers we can use to solve other problems. Today, start by introducing an even more expressive problem, 3-SAT. The main difference is that (A) we don't know any fast way of solving 3-SAT, and (B) while 3-SAT looks simple, it is *so* expressive that we view this as evidence that in fact it is intrinsically hard. It can express anything in the class NP. Then we'll see a huge number of other problems have exactly this same property. [How many have seen NP-completeness before? Even if you have, this is one of those topics where it's really important to develop your intuitions, so you should still pay attention] =============================================================== First a few definitions: * We'll say that an algorithm runs in Polynomial Time if for some constant c, its running time is O(n^c), where n is the size of the input. "Size of input" means "number of bits it takes to write the input down" * Problem A is poly-time REDUCIBLE to problem B (A <=_p B) if given a poly-time alg for B, we can use it to produce a poly-time alg for A. Problem A is poly-time EQUIVALENT to problem B (A =_p B) if A <=_p B and B <=_p A. Now, on to the problems: SAT: Given: a CNF formula (AND of ORs) over n variables x1,...,xn: (x1 OR x2 OR not(x3)) AND (not(x2) OR x3) AND (x1 OR x3) AND ... Goal: find an assignment to the variables that satisfies the formula if one exists. 3-SAT: Special case of SAT where each "clause" (OR) has at most 3 literals in it. ("literal" = a variable or its negation). This is called "3-CNF". DECISION vs SEARCH: when we get more formal, it will be easier to talk about the decision (yes/no) version of these problems: "Does there exist a satisfying assignment?" I claim these are poly-time equivalent. Decision <= search is easy. What about the other way? Set variables 1 at a time and use decision box to tell if we ever made a mistake. Fact 1: given alg for 3-SAT, can use to solve more impressive CIRCUIT-SAT. (i.e., CIRCUIT-SAT <=_p 3-SAT) CIRCUIT-SAT: Given a circuit of OR(x,y) and NOT(x) gates with a single output and no loops (some of the inputs may be hardwired). Question: is there a setting of the inputs that causes the circuit to output 1? E.g., [draw some circuit] Reducing CIRCUIT-SAT to 3-SAT: We will assign a variable to each of the non-hardwired inputs, and one to each gate. We will then use clauses to say that all internal gates have to produce the correct outputs given their inputs, and that the output of the circuit is 1. Rules: y = OR(x1,x2) is converted to (x1 OR x2 OR not(y)) AND (not(x1) OR y) AND (not(x2) OR y) y = not(x1) is converted to: (x1 or y) AND (not(x1) or not(y)) Also, if y is the output of the circuit, put in (y). We can see that the 3-CNF formula produced is satisfiable if and only if the circuit has a setting of inputs that causes it to output 1. The size of the formula is polynomial (actually, linear) in the size of the circuit. The construction can be done in polynomial (actually, linear) time. So, if we had a polynomial-time algorithm to solve 3-SAT, then we could solve circuit-SAT in poly time too. What can we do with CIRCUIT-SAT? Well, let's take some hard problem like factoring: Given a big number N written in binary, goal is to find a factor x, 1 < x < N, if one exists. We don't know any fast algorithm for this, but we can easily check a solution if someone handed it to us. Alg verify_factor(x,N) if x <= 1 or x >= N return NO. if ((N/x)*x == N) return YES. // integer division else return NO This runs in time polynomial in # of bits in x and N. Now, the key fact is that given an algorithm that runs in time t, we can create a circuit with polynomial in t many gates that implements it. Basically, we just unroll what the hardware is doing: each level in the circuit represents the contents of memory (can only affect t cells in t steps) and the gates between one level and the next implement one time step of the algorithm. We can do all this in time polynomial in t, which is polynomial in log(N). Now what we do is we hardwire the inputs corresponding to the bits of N. Then the problem "find a setting of the remaining inputs that causes this circuit to output 1" is the same as "find an input x such that verify_factor(x,N) outputs YES?" Size of circuit is polynomial in log(N). So, if we had a polynomial-time algorithm for circuit-SAT, we would have a polynomial-time algorithm for factoring. Notice, there was nothing particular to factoring here. Any problem that has an algorithm to *verify* a solution in time t(n) can be converted into a circuit of size polynomial(t). The problem of finding a solution that makes the verifier happy is the same as the problem of finding an input that makes the circuit output 1. So, if we have a polynomial time algorithm to solve circuit-SAT, we have a polynomial-time algorithm to solve any problem that has a polynomial-time verifier. This is what it means to say that circuit-SAT and 3-SAT are NP-complete. ==================================================================== Now, let's do it formally: First of all, we'll be technically looking at decision problems. E.g., consider the "Traveling Salesman Problem": given a graph G, find the shortest tour that visits all nodes at least once. We would convert this into a decision question "Given a graph G and a distance d, does there exist a tour that visits all nodes and has length <= d?" Claim: these are polynomial-time equivalent. The easy case is that decision <= search. How to go in the other direction? First use binary search to find optimal value. Then - delete edges so long as doesn't change the answer. COMPLEXITY CLASSES: P and NP. P: problems solvable in polynomial time. E.g., network flow, LP, etc. NP: problems that have polynomial-time verifiers. Specificially, problem Q is in NP if there is a polynomial-time algorithm V(w,I) such that: If "I" is a YES-instance, then there exists w such that V(w,I) = 1. If "I" is a NO-instance, then for all w, V(w,I) = 0. Furthermore, w should have length polynomial in size of I (since we are really only giving V time polynomial in the size of the instance). "w" is often called a "witness". E.g., consider 3-coloring. The witness that an answer is YES is the coloring. The verifier just checks that all edges are correct and that at most 3 colors are used. So, 3-coloring is in NP. What about for TSP? NP is like: "I might not know how to find it, but I'll know it when I see it" (BTW, Co-NP is the class of problems that work the other way around). It's pretty clear that P is contained in NP. Huge open question in complexity theory is P=NP? It would be really weird if they are equal so most people believe P != NP. But, it's very hard to prove that a fast algorithm for something does NOT exist. So, it's still an open problem. --> What would a problem that's not even in NP look like? Here's a problem that's believed to not be in NP: For the n by n version of chess, does the first player have a winning strategy? (The answer is not known for the usual 8x8 chess). Might be true --- but even if you knew it was true, it's not clear how you could give a short proof. NP-Completeness =============== Problem Q is NP-complete if: (1) Q is in NP, and (2) Any other problem Q' in NP is polynomial-time reducible to Q. So if you could solve Q in polynomial time, you could solve *any* problem in NP in polynomial time. If Q just satisfies (2) then it's called NP-hard. Theorem: 3-SAT and Circuit-SAT are NP-complete. IMPORTANT FACT: now that we have 3-SAT, in order to prove some other NP problem Q is NP-complete, we just need to reduce 3-SAT to it. I.e., show 3-SAT <= Q: if we could solve Q then we could solve 3-SAT. MAKE SURE YOU UNDERSTAND THIS - A LOT OF PEOPLE MAKE THE MISTAKE OF DOING IT THE OTHER WAY AROUND. Theorem: Max-Clique (Does G have a clique of size > k?) is NP-hard. Proof: reduce 3-SAT to MAX-CLIQUE. Given a 3-CNF formula F of m clauses over n variables, we construct a graph as follows. For each clause c of F we create one node for every assignment to variables in c that satisfies c. E.g., say F = (x1 OR x2 OR not(x4)) AND (not(x3) OR x4) AND (not(x2) OR not(x3)) AND ... Then we create up to 7m nodes: for each clause of F of size 3 there are 7 assignments to variables in it that satisfy the clause. (If the clause has size 2 then there are 3 such assignments.) E.g., in this case the nodes are: (x1=0,x2=0,x4=0) (x3=0,x4=0) (x2=0,x3=0) ... (x1=0,x2=1,x4=0) (x3=0,x4=1) (x2=0,x3=1) (x1=0,x2=1,x4=1) (x3=1,x4=1) (x2=1,x3=0) ... Then we put an edge between two nodes if the partial assignments are consistent. Note: max possible clique size is m. And, if the 3-SaT problem does have a satisfying assignment, then in fact there IS an m-clique. Claim is this is true in the other direction too. If the graph has an m-clique, then there is a satisfying assignment: namely, just read off the assignment given in the nodes of the clique. So, Max-Clique is NP-complete too. Independent Set =============== An independent set in a graph is a set of nodes no two of which have an edge. E.g., in a 7-cycle, the largest independent set has size 3. (E.g, in the graph coloring problem, the set of nodes colored red is an independent set). Theorem: Independent set (is there an Indep Set in G of size > k?) is NP-complete. Proof: Reduce from clique. Given graph G for clique problem, just take complement of the graph . Ie. create graph H such that H has edge (u,v) iff G does NOT have edge (u,v). Then H has an indep set of size k iff G has a k-clique. ------------------------------------------------------------------- Another example: Vertex-Cover ============ A vertex cover in a graph is a set of nodes such that every edge is incident to at least one. (e.g., look at cut-diamond graph). For instance, can think of as what rooms to put security guards in a museum. What we want is the smallest vertex cover. Decision problem: does there exist a vertex cover of size < k? Claim: if C is a vertex cover in a graph, then V-C is an independent set. Also if S is an independent set, then V-S is a vertex cover. So, to solve "is there an independent set of size > k?" just solve "is there a vertex cover of size < n-k?". So, Vertex cover is NP-Complete too.