Algorithms for Minimum Spanning Trees and Shortest Path Trees * review Lec. 5 notes - mention Wikipedia! - MIT lecture on simplex * new homework assignment has been posted! Due Dec. 6. Today is Armistice Day, or Veterans' Day in the United States. A poem I studied as a kid (my grandmother loved poetry): In Flanders Fields Flanders Poppy on the First World War battlefields. by John McCrae, May 1915 In Flanders fields the poppies blow Between the crosses, row on row, That mark our place; and in the sky The larks, still bravely singing, fly Scarce heard amid the guns below. We are the Dead. Short days ago We lived, felt dawn, saw sunset glow, Loved and were loved, and now we lie In Flanders fields. Take up our quarrel with the foe: To you from failing hands we throw The torch; be yours to hold it high. If ye break faith with us who die We shall not sleep, though poppies grow In Flanders fields. (Thanks to my friend Eric Kawamoto for reminding me of the poem.) ---------- Another one, more relevant to our topic: Algorhyme, by Radia Perlman I think that I shall never see A graph more lovely than a tree. A tree whose crucial property Is loop-free connectivity. A tree that must be sure to span So packets can reach every LAN. First, the root must be selected. By ID, it is elected. Least-cost paths from root are traced. In the tree, these paths are placed. A mesh is made by folks like me, Then bridges find a spanning tree. ---------- Basic ideas of graphs: * graph G vertices V edges E * edges are undirected, have weights * weights are arbitrary, don't have obey triangle inequality (for most of these algorithms) - may not correspond to anything physical - may be chosen according to human factors (e.g. contracts & money) * a few properties of graphs: - node degree - diameter - average distance - bisection bandwidth - total bandwidth - planarity * for these algorithms, no requirement that graph be planar Uses: * "virtual" graph might be e.g. a map, on top of which we are planning to build something (power grid, etc.) Intermediate concepts: * on a rich graph, we are looking for a subgraph that is a tree * spanning tree - https://en.wikipedia.org/wiki/Spanning_tree - reaches all nodes in the graph - no loops, only a single path to each node - with $|V| = n$ vertices, spanning tree has $n-1$ edges - Adding any edge to a spanning tree results in a cycle * minimum spanning tree has minimum possible total edge weight - https://en.wikipedia.org/wiki/Minimum_spanning_tree - if edge weights are distinct, then there is only one unique MST - MST algorithms shouldn't depend on where you are; MST is a characteristic of the graph as a whole * shortest path tree is different from MST - guarantees (a) shortest path between all pairs of nodes - can be the same as MST, often is not - trees are different for different selected root * flow - "amount" of something you can send from one point to another, using all available resources * frontier - as we are working, we will maintain a "frontier" of the set of nodes that are currently being explored. (This term comes from Russell and Norvig.) Problems: 1. Find a minimum spanning tree on a larger graph 2. Shortest paths: a. Find a shortest path for a given source-destination pair b. Build a shortest path tree for all destinations from a specified source c. Find shortest path trees for all source-destination pairs 3. Calculate the possible flow on a graph 4. (Determine other things such as vulnerability to partition, etc.) 5. (Other fun math things like traveling salesman, Hamiltonian cycles and Euler's bridges, etc.) Algorithms: (n.b.: these notes don't directly describe the algorithms) With the exception of Perlman's STP, all of these are covered in Cormen, Leiserson, Rivest and Stein. I believe only Dijkstra is covered in Guerin, Koenemann, and Tuncel. Perlman is covered in her own book, but actually not in the way I prefer to see the messages laid out in the two phases (root election then tree building). The best A* explanation, in my opinion, is in Russell and Norvig. (Russell and Norvig actually has a *LOT* of material on optimization; it's a great place to go next in the study of optimization, with material on heuristics and stochastic methods for problem solving. http://aima.cs.berkeley.edu/) Solving shortest path problems: * Dijkstra's Shortest Path First (1959) - https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm - slightly different from Prim's algorithm (1930), which produces an MST - works on directed or undirected graph - starts from a node, selects an edge from the "frontier" - selects one with the lowest *total* cost back to the root - if that edge adds a new node, we're done with that add - if the new node is known, and edge increases cost, remove - if the new node is known, and edge decreases cost, adopt new cost (no need to recurse here, should only happen for nodes on the "frontier") - interesting characteristic: if you calculate in distributed fashion, everything "just works"! - leads to Open Shortest Path First (OSPF) https://en.wikipedia.org/wiki/Open_Shortest_Path_First (not the best Wikipedia page) * Radia Perlman's Spanning Tree Protocol (1985) - https://en.wikipedia.org/wiki/Radia_Perlman - https://en.wikipedia.org/wiki/Spanning_Tree_Protocol - (she invented the algorithm in 1985, standardized protocol was broader team) - Wikipedia page has good links to the whole family, Cisco, RFCs, IEEE standards, etc. https://en.wikipedia.org/wiki/Spanning_Tree_Protocol - select a root based on priority (requires sending IDs to everybody) usually configured by hand by admin - generates *a* shortest path tree (not necessarily unique) - usually run with link costs all the same, so uniqueness of tree not guaranteed, so choice of root affects the tree that develops - bridge protocol data units (BPDUs) are sent out, incremented as they pass along the links - each bridge (node) selects the port that gives it minimum cost to root - can operate to connect broadcast domains (LAN segments), which in graph theory terms can mean treating each domain as a "hyperedge" in a "hypergraph" - our first *distributed* algorithm * A* algorithm (1968) - https://en.wikipedia.org/wiki/A*_search_algorithm - Peter Hart, Nils Nilsson and Bertram Raphael, SRI - "variant" of Dijkstra - uses an additional heuristic to estimate cost - can be more efficient in exploration of large spaces - requires the ability to estimate a *minimum* cost to the destination (for this algorithm, triangle inequality must hold, more or less) Solving minimum spanning tree problems: * Prim's algorithm (1930) - https://en.wikipedia.org/wiki/Prim%27s_algorithm - starts from a node, adds least-cost edge that adds a new node *to the one tree we are working on*; that is, a node from the frontier (there is only ever a single tree being built) - similar to Dijkstra, but with a _simpler_ means of choosing the next link to add - Animation of Prim's spanning tree algorithm: http://www.algomation.com/algorithm/prim-minimum-spanning-tree It's not easily seen how the decision for which edge to add next is made, but the code is right there alongside the animation, which is a huge help. * Boruvka's algorithm (1926) - https://en.wikipedia.org/wiki/Bor%C5%AFvka%27s_algorithm - invented to optimize construction of the electricity network in Moravia! - many components (trees) in development at the same time - for each node, add its lowest-weight edge to our proposed MST (will be a "forest" of trees until all are connected) - go back, add minimum weight edge from each larger component to another component - repeat until connected - simplistically, begin with a forest of single-node trees, and combine trees - animated: https://en.wikipedia.org/wiki/Bor%C5%AFvka%27s_algorithm#/media/File:Boruvka%27s_algorithm_(Sollin%27s_algorithm)_Anim.gif - requires that edge weights be distinct - discovered and rediscovered repeatedly, also called Sollin's algorithm * Kruskal's algorithm (1956) - https://en.wikipedia.org/wiki/Kruskal%27s_algorithm - many components (trees) in development at the same time - order the edges by weight - take lowest-cost edge, see if it combines two components - repeat until connected ====== Directed v. Unidirected and Next Hop Table: There has been some confusion over the next hop table and HW5. First, directed v. undirected graph: the basic network in all of our examples and homeworks begins as an undirected graph -- you can move in either direction across any link. However, a path from, say, A to F is inherently directed. It might go A->B->C->D->E->F, for example. After you run Dijkstra's algorithm, usually people don't bother to mention the direction you would cross a link, since it's obvious from the tree structure. Understanding the homework, though, depends somewhat on that directionality. If you build a shortest-path tree from A to F, in this case, we expect that it will eventually pass through D. How does it get from D to F? There are two basic ways: (1) Follow a path calculated entirely by the source (A), with that information communicated to the other nodes in some fashion. (2) Allow each node to *independently* decide how to get the packet from itself to the specified destination. In order for this to work, the individual nodes must make *consistent* decisions. The packet will pass through D. At D, it will use the shortest-path tree that D calculated in order to figure out to get from D to F. So, how do we guarantee that D's decision is consistent with A's? How do we guarantee that the packet won't end up back at C instead of going on to E, like it should? We can check that by checking that the shortest-path spanning tree calculated at D is consistent with that calculated at A, B and C for some set of destinations. You will want to check that it's consistent for direction of the link, as well as the next hop. Okay, speaking of the Next Hop, that's the table that each node uses to decide where to go next. Completely making up an example, it might look like: Node A's Next Hop Table: destination next hop B B C C D B E B F C That is, A is directly connected to B and C, so it's only one hop to them, but in order to get to D, it has to first send the packet to B. So the homework is to calculate that next hop table for each of the nodes in our network, and compare them to figure out if they are consistent, will correctly route packets between all pairs of sources and destinations. For homework 1a, the answer will be a list of edges from the original graph. For 1b, it will be several lists of edges, plus a qualitative answer to the questions. For 2a, it will be a set of edges that make a spanning tree. For 2b, it will be a set of edges making a spanning tree rooted at a different node. 2c is discussion. 2d will be a Next Hop table like the one above, for *each* node. 3, the answer is qualitative.