15-451 Algorithms: Lecture 4/6/00

Announcements:

- Recitation session tonight on this week's lectures

- Quiz Tues 4/11 second half of class (40 mins)
  - Quiz review Monday 4/10 6:30 - 8:30 pm, Room TBA on course Bboard
  - Quiz has 5 questions (mostly short answer), a total of only 7 parts
  - Today's lecture not on Quiz, but needed for Homework 6

Homework 6 is available on the course web page

**********

Topic:  Hardness of Approximation

  I. Approximation lower bound for coloring
 II. Approximation schemes
III. MAX SNP-hard problems and L-reductions
 IV. Example L-reductions

**********

Recap: Approximation algorithms

* An algorithm achieves an approximation ratio r if for ALL instances of the
  problem, it finds a solution with ratio <= r, where

   - For minimization problems, approximation ratio = (size of alg solution)/(optimal size)
   - For maximization problems, approximation ratio = (optimal size)/(size of alg solution)

* Can also consider relative error: 
  |(size of alg solution) - (optimal size)| / (optimal size)

* Interested in poly-time approximation algorithms (PTAAs)

* Typically, seek PTAA that actually FINDS a solution that is within the
  approximation ratio

* For Vertex Cover, presented two PTAAs with approximation ratio 2

* For Set Cover, presented two PTAAs with approximation ratio O(lg n) 
  (for one, the ratio held w.h.p.)

**********

I. Approximation lower bound for coloring

Can we devise a PTAA for coloring with an approximation ratio of < 4/3
   i.e., (colors used by alg)/(min colors required) < 4/3?

What would this mean for 3-colorable graphs???
[The algorithm outputs 3-coloring iff the graph is 3-colorable.]

This argument can be generalized to the following theorem:

Thrm (*): Let Q be a minimization problem with positive integer solutions,
and suppose that for some fixed integer K > 0 the decision problem for Q 
"Is the optimal size <= K?" is NP-hard.
Then, if P not equal to NP, no PTAA for Q can have approximation ratio < 1+(1/K).

Analogous result holds for maximization problems.

For the case of coloring, the following much stronger bound is known:

Thrm: If P not equal to NP, then there exists a constant epsilon > 0
such that no PTAA for coloring can have approximation ratio < n^epsilon.

Same for maximum independent set.

**********

II. Approximation schemes

Approximation scheme: An approximation algorithm that takes as input
* a problem instance
* a bound epsilon > 0
and outputs a solution with a relative error at most epsilon

* Polynomial-time approximation scheme (PTAS): 
  for any fixed epsilon > 0, the scheme runs in poly-time in the size 
  of its input instance.

Thus can get arbitrarily close to an optimal answer,
but running time is nondecreasing with decreasing epsilon.

Corollary of Thrm (*): If P not equal to NP, 
no PTAS for any problem Q defined in Thrm (*).

In class, we gave a 2-approximation for Vertex Cover.
The following is also known:

Thrm: If P not equal to NP, then no PTAS for Vertex Cover,
even for bounded-degree graphs.

**********

III. MAX SNP-hard problems and L-reductions

Would like to be able to show that one problem is hard to approximate
by a reduction from a problem known to be hard to approximate.

Similar to reductions used to show a problem is NP-hard, but also preserving
an approximation relative error (to within a constant factor).

Then we could claim that all the problems that can be reduced to one another
are equally hard to approximate, so that if we found a PTAS for one, we would
have a PTAS for all.

Combined with a result showing that if P not equal to NP, then no PTASs exist
for these problems, gives a "proof" of hardness of approximation.

**********

One such class that fits the above goals is the class of
"MAX SNP-hard problems".

The precise definition of this class is beyond the scope of this course,
but fortunately we need not know the exact definition:

* If we know some problems in this class
* And we know how to do reductions that preserve an approximation relative
  error to within a constant factor

Then we can show that other problems are also MAX SNP-hard.

Combined with the known result that:

Thrm: If P not equal to NP, then no PTAS for any MAX SNP-hard problem.

We thus have a way to prove hardness of approximation.

**********

Examples of MAX SNP-hard problems:

MAX 3SAT: Given a set of clauses with <= 3 literals each, find a truth
          assignment that satisfies the MOST clauses

MAX 2SAT: Same as MAX 3SAT, but with <= 2 literals per clause

          E.g., (x1 or x2) and (not(x1) or x3) and not(x2) and not(x3)
                Can satisfy at most 3 of the 4 clauses

MAX 3SAT-B: Same as MAX 3SAT, but <= B occurrences of each variable,
            for constant B

INDEPENDENT SET-B: Find the largest independent set in a graph with
                   vertex degree bounded by B, for a constant B

VERTEX COVER-B: Find the minimum vertex cover in a graph with
                vertex degree bounded by B, for a constant B

MAX CUT: Find the cut with the maximum number of edges

**********

To show an optimization problem Q is MAX SNP-hard:

1. Select a known MAX SNP-hard problem Q'

2. Select a constant c1 > 0 and
   describe a poly-time algorithm that computes a function f mapping 
   every instance x' of Q' to an instance x = f(x') of Q
   such that (optimal size for x) <= c1 * (optimal size for x')

3. Select a constant c2 > 0 and
   describe a poly-time algorithm that computes a function g mapping 
   solutions s of x with size c to a solution s' = g(s) of x' with size c'
   such that |c' - (optimal size for x')| <= c2 * |c - (optimal size for x)|

   (Typically, c2 = 1)

This is called an L-reduction from Q' to Q.

L-reduction stands for "linear reduction" because there is a linear blow-up
in the relative approximation error.

Thrm: If Q' L-reduces to Q, and there is a PTAA for Q with relative 
approximation error epsilon, then there is a PTAA for Q' with relative
approximation error c1 * c2 * epsilon.

Claim: If there is a PTAS for one MAX SNP-hard problem, there is a PTAS
       for all MAX SNP-hard problems.

Why??? [If Q' L-reduces to Q, and there is a PTAS for Q, then to obtain
        a PTAA with relative approximation error epsilon for Q', apply
        the PTAA for Q with relative approximation error epsilon/(c1*c2)]

**********

IV. Example L-reductions

MAX 3SAT-B to INDEPENDENT SET-B

Given that MAX 3SAT-B is MAX SNP-hard, show that INDEPENDENT SET-B is
MAX SNP-hard.   (Note that the B's need not be the same, just both constants.)

Proof:  Q = INDEPENDENT SET-B, Q' = MAX 3SAT-B

(i) Take c1 = 1, and the following mapping:

Construct a graph with one node for every occurrence of every literal.
There is an edge connecting any two occurrences of complementary literals.
There is an edge connecting literal occurrences from the same clause
(thus there is a triangle for every clause with 3 literals, 
and an edge for a clause with 2 literals).

E.g.,  (x1 or not(x2) or x3) and (x2 or x3 or x4) and (not(x3) or x4) and ...

        x1 -- x2' -- x2 -- x4
         \    /       \    /
          \  /         \  /
           x3 -- x3' -- x3
                  |
                 x4

Claim: If every variable occurs <= k times in the clauses, then the degree is <= k+1.
Why??? [<= 2 inside clause, <= k-1 times in other clauses, so <= k-1 complementary
        in other clauses]

So instance of INDEPENDENT SET-B with B = k+1.

Claim: An independent set of size c corresponds to a truth assignment that
satisfies at least c clauses.

Why???
[An Independent Set cannot select both a literal and its complement, 
 and can select at most one literal from each clause, so a truth assignment 
 can be obtained by setting the variables according to which nodes were 
 in the independent set.
 For each node in the independent set, there is a satisfied clause (there
 may be other satisfied clauses).]

E.g., On the above graph, the truth assignment for the independent set {x1, x2, x3'}
      is simply setting x1 and x2 to TRUE and x3 to FALSE.
      It satisfies >= 3 clauses.

Claim: The size of the maximum independent set in the graph is equal to the
maximum number of clauses that can be satisfied. 

Why???
[For each satisfied clause, there is a node in the graph that can be added to
 the independent set.]

Thus (optimal size for x) = c1 * (optimal size for x')

(ii) Take c2 = 1, and the following mapping:

The truth assignment is obtained by setting the variables according to which 
nodes were in the independent set.
As argued above, the size of this truth assignment (i.e., the number of satisfied 
clauses) >= the size of the independent set.

Thus    |(alg size for x') - (optimal size for x')| =
         (optimal size for x') - (alg size for x')  =
         (optimal size for x) - (alg size for x')  <=
         (optimal size for x) - (alg size for x)   <=
   c2 * |(alg size for x) - (optimal size for x)|

Therefore, INDEPENDENT SET-B is MAX SNP-hard.

**********

INDEPENDENT SET-B to MAX 2SAT

Proof:  Q = MAX 2SAT, Q' = INDEPENDENT SET-B

(i) Take c1 = 1 + B(B+1)/2, and the following mapping:

We have a variable for each node.
For every node x, we have the clause (x).
For every edge (x,y), we have the clause (not(x) or not(y)).

E.g.,     a      
         / \     (a) and (b) and (c) and (d) and (not(a) or not(b)) and
        b---c    (not(a) or not(c)) and ((not(b) or not(c)) and
         \ /     (not(b) or not(d)) and (not(c) or not(d))
          d

[Stopped here in lecture.]

Claim: There exists an optimal assignment that satisfies all the clauses 
corresponding to the edges.

Why? Any optimal solution that fails to satisfy a clause (not(x) or not(y))
can be converted to one that does by setting x = FALSE: it satisfies this
clause and only unsatisfies the clause (x).

Claim: The nodes x whose variable is true form an independent set.
Why??? [If (x) and (y) set to TRUE and there is an edge (x,y), 
        then clause (not(x) or not(y)) would be FALSE, contradicting above claim.]

Thus, the maximum number of clauses that can be satisfied is equal to the size
of the maximum independent set plus the number of edges.

Claim: For graphs on n nodes with bounded degree B, the size of the
maximum independent set is >= n/(B+1).
Why??? [Following algorithm selects such an MIS: repeatedly select a node
        and discard all its neighbors, until the graph is empty.]
 
Since the graph has bounded degree B, the number of edges is <= B * (the number
of nodes)/2.

Thus (optimal size for MAX 2SAT instance) = (optimal size for IS instance) + Bn/2
      <= (optimal size for IS instance) + (B(B+1)/2) * (optimal size for IS instance)
      <= c1 * (optimal size for IS instance)

(ii) Take c2 = 1, and the following mapping:

First convert the MAX 2SAT solution to a solution of no smaller size that 
satisfies all the edge clauses by setting the first variable in each unsatisfied 
clause to FALSE.  (Recall the argument above as to why this has no smaller size.)
The independent set is the set of nodes x whose variable is true.

Let I = MAX 2SAT instance and I' = Independent Set-B instance

Then    |(alg size for I') - (optimal size for I')| =
         (optimal size for I') - (alg size for I')  =
         (optimal size for I' + #edges) - (alg size for I' + #edges)  <=
         (optimal size for I) - (alg size for I)   <=
   c2 * |(alg size for I) - (optimal size for I)|

Therefore, MAX 2SAT is MAX SNP-hard.