15-451 Algorithms 10/10/00 * Graphs Algorithms I Midterm next week. - coloring&BFS (1 sheet of notes) - topological sorting & DFS - Bellman-Ford for shortest paths - Dijkstra for shortest paths ======================================================================= A lot of algorithmic problems can be modeled as problems on graphs. Today will talk about a few basic ones. Continue talking about graph algorithms for good portion of rest of course. Basic terminology: A graph is a set of NODES, or VERTICES, with edges between some of the nodes. E.g., 1 4 |\ | | \ | 2----3 V = set of vertices, E = set of edges. If there's an edge between two vertices, we call them NEIGHBORS. Degree of a vertex is the number of neighbors it has. standard notation: n = |V|, m = |E|. If we don't allow self-loops, what is max number of total edges? {n \choose 2} In a directed graph, each edge has a direction. Can have up to n(n-1) edges now. For each node, can talk about out-neighbors (and out-degree) and in-neighbors (and in-degree). PROBLEM 1: GRAPH COLORING. Given a graph, want to give each node a color such that no two neighbors have the same color. Goal is to do this with the fewest total number of different colors (easy to do with n colors). E.g., pentagon. Think of problem of scheduling exams. Each node is a class, and there is an edge between two if lots of students are taking both classes. Goal is to use fewest number of different time slots such that no two adjacent nodes are given the same time slot. In compilers, graph coloring comes up when assigning variables to registers. Unfortunately, graph-coloring problem is NP-hard. In fact, even telling if possible to color with <= 3 colors is NP-hard in general. But, 2-coloring problem (color with 2 colors if possible else say "impossible") is easy. How to do it? 2-coloring: given graph G, find a 2-coloring if one exists, else output "not 2-colorable". One Answer: use breadth-first search. Pick some start node and color it red. Color all neighbors blue. Color their neighbors red, etc. (If didn't reach some nodes then they're in separate component so can do them separately). Claim: if this coloring isn't legal, then the graph wasn't 2-colorable. Why? -> If coloring is illegal ==> edge between two nodes at same distance ==> have odd-length cycle in the graph ==> graph isn't 2-colorable. Total time = time for BFS = O(m + n) (We could have used depth-first-search too) Here's a neat problem if you like to doodle. Draw a box (or circle, it doesn't matter). Then slice it with straight lines. Now, color the regions so no two adjacent (sharing an edge) have the same color. How many colors do you need? Answer: 2. Can anyone see a proof? (So, good for doodling in pencil). ======================================================================== PROBLEM 2: Topological sorting. A "DAG" or directed-acyclic-graph is a directed graph without any cycles. E.g., A----->D--->E |\ ^ | \ | | \ | | | / V V/ C<--B<----F Given a DAG, the "topological sorting" problem is: find an ordering of the vertices such that all edges go from low numbers to high numbers. Typically comes up when given a set of tasks to do with precedence constraints (need to do A and F before you can do B), and want to find a legal ordering of performing the jobs. How to solve it #1: 1. find a node v with no in-edges. 2. remove v and its outedges 3. goto 1. This takes time O(n^2). How about a linear-time way? Here's a neat algorithm based on depth-first-search. DFS-main: For v=1 to n: if v is not yet visited, do DFS(v). DFS(v): mark v as visited. for each neighbor w of v: if w is unvisited, then DFS(w) How to solve it #2: 1. Use depth-first search, and output the nodes as they are exited. [output v after for-loop] 2. reverse the order. Claim: This is a topological sorting. Why? If there is an edge from u to v, then v has to finish first. Easy to see if our DFS started at u before starting at v. What if we started at v first (like u=D,v=B above)? In this case, we would finish v before even starting u since there can't be a path from v to u (else wouldn't be acyclic). Note: topological sorting in linear time (unlike regular sorting) Note: can use same alg to tell if graph has any cycles. ------------ PROBLEM #3: SHORTEST PATHS IN WEIGHTED GRAPHS Given a (possibly directed) graph with costs/lengths on edges, and one node specified as a "goal" node, find the shortest path from every node in the graph to the goal. Will give two algs: one that handles edges of negative cost but is slower and uses Dynamic Programming (Bellman-Ford), and one that is faster but requires all edges to have nonnegative weight (Dijkstra's alg). 30 30 -- picture -- 5-------------------3-----------------4 | ^ | 10 | -20| 15 | | ^ | | | | 2-------------------1-----------------0-- 30 50 Goal BELLMAN-FORD algorithm. Bellman is guy who invented DP. Works even if have negative-cost edges (but these should be directed -- otherwise problem doesn't make sense since could go back and forth racking up bonus points forever. Similarly, assume no negative-cost cycles). Alg idea: find shortest path that uses 1 or fewer edges (if it exists) use to find shortest path that uses 2 or fewer edges (if it exists). use to find shortest path that uses 3 or fewer edges (if it exists). ... [how far do we need to go?] use to find shortest path that uses V-1 or fewer edges. BELLMAN-FORD pseudocode: as usual for DP, let's just compute *distances* first, and then can reconstruct the paths. initialize d[v][0] = infinity for v != goal, 0 for goal. for i=1 to |V|-1: for each v not equal to goal: d[v][i] = MIN(d[x][i-1] + len(vx) )) v->x Inner MIN operation takes time proportional to out-degree of v. So, inner for-loop takes time O(sum of out-degrees of all nodes) = O(E). --> total time is O(|E|*|V|). How to recosntruct the paths? If you're at vertex v at distance d[v], move to node x such that d[v] = d[x] + len(vx). =============================================================================== Faster alg: Dijkstra's algorithm. Idea is to build up shortest-path tree in greedy way. Only guaranteed to work if no negative-weight edges. E.g., 5 10 A-----B-----C | | | 1| 1| |5 goal = A. | 2 | 4 | D-----E-----F DIJKSTRA's ALG -------------- initialize: - tree = {goal}. No edges. goal.dist = 0. invariant: nodes in tree have correct distance to goal. repeat: - for each node x that's a neighbor of tree (x is a FRINGE node) compute: x.dist = min [ v.dist + length(e) ] edges e=(x,v) v in tree - put x of minimum x.dist into tree, connecting it via the argmin (the edge that caused x.dist to be what it is) [do above example] Claim: even if some of dists are wrong, the *minimum* one is correct. Proof: say x is the fringe node of smallest x.dist. What does the *true* sortest path to x look like? The point is that the first node this path hits once it exits the tree has to be x, since if it was some other node y, then its length would have to be at *least* y.dist, which is greater than x.dist. Notice that this argument would break down if there was a negative-cost way of getting from y to x. That's why Dijkstra's algorithm doesn't allow negative-cost edges. ------ To implement this efficiently, want to store fringe in some kind of data structure. Need to be able to insert pairs (node, dist) and need to be able to modify the dist field (actually, only decrease it, which simplifies things). Need to be able to remove the minimum. We can use a heap to do this in which case each operation takes worst-case O(log V) time. There are O(E) operations (each node x on the fringe gets updated at most degree(x) times), so total is O(E log V). Actually, there's a more complicated data structure called a Fibonacci heap where insert and remove min are still O(log V), but decrease-value takes O(1) *amortized* time. This is nice since only O(V) inserts and deletes, but potentially O(E) decrease-values. So, total time is now O(E + V log V). This is better if lots of edges.