Algorithms for Minimum Spanning Trees and Shortest Path Trees Algorhyme, by Radia Perlman I think that I shall never see A graph more lovely than a tree. A tree whose crucial property Is loop-free connectivity. A tree that must be sure to span So packets can reach every LAN. First, the root must be selected. By ID, it is elected. Least-cost paths from root are traced. In the tree, these paths are placed. A mesh is made by folks like me, Then bridges find a spanning tree. ---------- Basic ideas of graphs: * graph G vertices V edges E * edges are undirected, have weights * weights are arbitrary, don't have obey triangle inequality (for most of these algorithms) - may not correspond to anything physical - may be chosen according to human factors (e.g. contracts & money) * a few properties of graphs: - node degree - diameter - average distance - bisection bandwidth - total bandwidth - planarity * for these algorithms, no requirement that graph be planar Uses: * "virtual" graph might be e.g. a map, on top of which we are planning to build something (power grid, etc.) Intermediate concepts: * on a rich graph, we are looking for a subgraph that is a tree * spanning tree - https://en.wikipedia.org/wiki/Spanning_tree - reaches all nodes in the graph - no loops, only a single path to each node - with $|V| = n$ vertices, spanning tree has $n-1$ edges - Adding any edge to a spanning tree results in a cycle * minimum spanning tree has minimum possible total edge weight - https://en.wikipedia.org/wiki/Minimum_spanning_tree - if edge weights are distinct, then there is only one unique MST - MST algorithms shouldn't depend on where you are; MST is a characteristic of the graph as a whole * shortest path tree is different from MST - guarantees (a) shortest path between all pairs of nodes - can be the same as MST, often is not - trees are different for different selected root * flow - "amount" of something you can send from one point to another, using all available resources * frontier - as we are working, we will maintain a "frontier" of the set of nodes that are currently being explored. (This term comes from Russell and Norvig.) Problems: 1. Find a minimum spanning tree on a larger graph 2. Shortest paths: a. Find a shortest path for a given source-destination pair b. Build a shortest path tree for all destinations from a specified source c. Find shortest path trees for all source-destination pairs 3. Calculate the possible flow on a graph 4. (Determine other things such as vulnerability to partition, etc.) 5. (Other fun math things like traveling salesman, Hamiltonian cycles and Euler's bridges, etc.) Algorithms: (n.b.: these notes don't directly describe the algorithms) With the exception of Perlman's STP, all of these are covered in Cormen, Leiserson, Rivest and Stein. I believe only Dijkstra is covered in Guerin, Koenemann, and Tuncel. Perlman is covered in her own book, but actually not in the way I prefer to see the messages laid out in the two phases (root election then tree building). The best A* explanation, in my opinion, is in Russell and Norvig. (Russell and Norvig actually has a *LOT* of material on optimization; it's a great place to go next in the study of optimization, with material on heuristics and stochastic methods for problem solving. http://aima.cs.berkeley.edu/) Solving minimum spanning tree problems: * Prim's algorithm (1930) - https://en.wikipedia.org/wiki/Prim%27s_algorithm - starts from a node, adds least-cost edge that adds a new node *to the one tree we are working on*; that is, a node from the frontier (there is only ever a single tree being built) - similar to Dijkstra, but with a _simpler_ means of choosing the next link to add - Animation of Prim's spanning tree algorithm: http://www.algomation.com/algorithm/prim-minimum-spanning-tree It's not easily seen how the decision for which edge to add next is made, but the code is right there alongside the animation, which is a huge help. * Boruvka's algorithm (1926) - https://en.wikipedia.org/wiki/Bor%C5%AFvka%27s_algorithm - invented to optimize construction of the electricity network in Moravia! - many components (trees) in development at the same time - for each node, add its lowest-weight edge to our proposed MST (will be a "forest" of trees until all are connected) - go back, add minimum weight edge from each larger component to another component - repeat until connected - simplistically, begin with a forest of single-node trees, and combine trees - animated: https://en.wikipedia.org/wiki/Bor%C5%AFvka%27s_algorithm#/media/File:Boruvka%27s_algorithm_(Sollin%27s_algorithm)_Anim.gif - requires that edge weights be distinct - discovered and rediscovered repeatedly, also called Sollin's algorithm * Kruskal's algorithm (1956) - https://en.wikipedia.org/wiki/Kruskal%27s_algorithm - many components (trees) in development at the same time - order the edges by weight - take lowest-cost edge, see if it combines two components - repeat until connected - *NOTE*: There is an implementation of Kruskal on algomation.com, but as of 2017/6/7, it is the same as Prim, which is incorrect! Solving shortest path problems: * Dijkstra's Shortest Path First (1959) - https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm - slightly different from Prim's algorithm (1930), which produces an MST - works on directed or undirected graph - starts from a node, selects an edge from the "frontier" - selects one with the lowest *total* cost back to the root - if that edge adds a new node, we're done with that add - if the new node is known, and edge increases cost, remove - if the new node is known, and edge decreases cost, adopt new cost (no need to recurse here, should only happen for nodes on the "frontier") - interesting characteristic: if you calculate in distributed fashion, everything "just works"! - leads to Open Shortest Path First (OSPF) https://en.wikipedia.org/wiki/Open_Shortest_Path_First (not the best Wikipedia page) * Radia Perlman's Spanning Tree Protocol (1985) - https://en.wikipedia.org/wiki/Radia_Perlman - https://en.wikipedia.org/wiki/Spanning_Tree_Protocol - (she invented the algorithm in 1985, standardized protocol was broader team) - Wikipedia page has good links to the whole family, Cisco, RFCs, IEEE standards, etc. https://en.wikipedia.org/wiki/Spanning_Tree_Protocol - select a root based on priority (requires sending IDs to everybody) usually configured by hand by admin - generates *a* shortest path tree (not necessarily unique) - usually run with link costs all the same, so uniqueness of tree not guaranteed, so choice of root affects the tree that develops - bridge protocol data units (BPDUs) are sent out, incremented as they pass along the links - each bridge (node) selects the port that gives it minimum cost to root - can operate to connect broadcast domains (LAN segments), which in graph theory terms can mean treating each domain as a "hyperedge" in a "hypergraph" - our first *distributed* algorithm * A* algorithm (1968) - https://en.wikipedia.org/wiki/A*_search_algorithm - Peter Hart, Nils Nilsson and Bertram Raphael, SRI - "variant" of Dijkstra - uses an additional heuristic to estimate cost - can be more efficient in exploration of large spaces - requires the ability to estimate a *minimum* cost to the destination (for this algorithm, triangle inequality must hold, more or less) ======