MST algorithm

Posted on October 1, 2023   19 minute read ∼ Filed in  : 

Introduction

This paper identifies parameters that determine the behavior of fundamental global network problems.

If a distributed network algorithm’s running time for a network with n vertices can be optimized to O(n), it’s considered a success

  • Its performance will grow linearly and in direct proportion to the size of the input data set (n).
  • One example: In the leader selection problem, its from O(nlogn) -> O(n)

This paper says: The more sensitive parameter to consider is the network’s diameter, termed “Diam”, and it is demonstrated by providing a distributed minimum-weight spanning tree algorithm.

  • it uses graph decomposition and edge-elimination-by-pipelining.

Universal Optimality:

  • if an algorithm is universally optimal, it pinpoints the exact parameters contributing to the problem’s complexity.

Question:

  • Can we identify inherent graph parameters for various network problems and then develop universally optimal algorithms based on these parameters?

The paper track the MST problem (Minimum-Weight Spanning Tree),

  • Past algorithm: O(nlogn) -> O(n)
  • While the question is where the O(n) is univerasally optimal.
  • Further, the paper present MST algorithm with O(Diam + n^ε· log^n) time complexity.

Problem Model

Goal:

The objective is to build a spanning tree across all the nodes in V that has the minimal total edge weight.

  • A spanning tree of a graph is a tree that spans (reaches out to) all the vertices of the graph and contains all the vertices with the minimum required number of edges.
  • the goal is to construct a tree that connects all the nodes in the set V such that the sum of the weights of the edges in this tree is as small as possible.

Assume:

  • Every node has s unique identifier, each edge has a distinct weight (can also be identifed by identifer)
  • Diam(F) is the diameter of the subgraph F, which is the maximum unweighted distance (number of hops) between any two nodes in F.
  • Ignore communication costs (# msgs the algorithm uses), and it’s samll cost
  • synchronous computation: run in rounds, each round take one time unit. All msg delivered in next cycle.
    • Even though the paper assumes synchronous computation, they note that they can adopt asynchronous mechanisms due to their decision to ignore communication costs.
  • Constraints:
    • message are size of O(logn), and one node can send one msg per edge in each round. In other words, suggests that the size of the message grows logarithmically with the number of nodes.

Solution-Context:

Distributed growth approach:

  • This algorithm does not utilize a central controller, or “center of activity,” but rather allows a large number of processes to proceed simultaneously and independently in the network.
  • Each node will form a fragement by merging the neibouhoods, the paper claims that it needs to prevent some fragments from growing too large while other are still very small.

Coordinated elimination approach:

  • Use a centralized coordiantor to send/receive msgs is communication intensive, thus it needs delegating the work of the center node to other nodes along the paths leading to the center.
  • It focus on eliminate edges that are not part of MST, using red rule.

New Pipelining techniques

  • This is to prevent communication congestion, It has three satges and eliminate short cycles, reduce number of edges.
  • This paper use only two satges, simplify the process, the main change is the way they handle cycle elimiation, nodes froward descriptions of candidate edges to their parents on a spanning tree build using BFS.

The algorithm:

Two stages to build MST:

  • Controlled-GHS:

    • it’s an adapted version of the original GHS algorithm, original GHS: Objective: Nodes in a network form ‘fragments’ that increase in size iteratively until all nodes belong to a single, large fragment representing the MST.
    • Detais:
      • Each node is its own fragment
      • Each fragment has a tree, where one node is the root, and all others can track back to the root.
      • steps of : Fragments try to find their minimum-weight outgoing edge,
        • Each node in the fragment reports the minum weight outgoing edge ot the above nodes, while above gather all information below and then select the light-weight weight and then reports up, all the way until root, Then root knows the minim-weight outging edge in the fragement.
        • Root then send the information to the fragement following the path of the minimum-weight outgoing edges.
        • If another fragment also chosen the same edge, both fragements merge into large fragment.
        • Finally, all nodes form a single fragement
  • Edge elimination

    • After obtaining the large fragment from the Controlled-GHS phase, the edge elimination step is crucial to prune away those unnecessary edges, thus building a tree.
    • steps:
      • All nodes forward details of edges into central nodes.
      • Some edge descriptions are elimintaed.
      • central node does the final edge elimination to give the MST.
  • summary:

    The combined approach first uses a controlled version of the GHS algorithm to create larger and larger fragments while maintaining a tree structure within each fragment. Once there’s a small number of large fragments, the algorithm moves to the edge elimination phase, where unnecessary edges are removed, leaving behind the Minimum Spanning Tree for the entire network.

Dominating set:

  • a subset of vertices such that every vertex not in the subset is adjacent to at least one vertex of the subset.
  • The problem here is to find a “small” dominating set on a given tree.

Solutions:

  • Every vertex not in M has neighbor in M
  • Size of M M is less than half the total vertices in the tree T.

L^~(v) is a function to assign a level to each node in the tree. leaf is 0, node above leaf is 1 etc

Algorithm Small-Dom-Set

finds a small dominating set on a tree T:

  • Mark the nodes: Assign level numbers to each node in the tree using level function.
  • Among the nodes that are not mard with a level, select MIS.
  • Dominating set M = Q and L^~(v)

Controlled-GHS procedure.

a modified version of the GHS algorithm, it aims to:

  • Limit the number of fragments upon termination to N.
  • Ensure that throughout its run, the diameter of every fragment F is bounded by d.

Run in phases, each phase has two stag:

  • stage 1: selection of minimum weigth outgoing edges:

    For each fragment, this stage identifies the smallest-weight edge that connects it to a different fragment. This step prepares the fragments for a potential merge

  • stage 2: fragment merging:

    • It identifies a dominating set for each tree in the fragment forest, and then, they combined to get MF_f
    • fragments outside MF_f pick a neighboring fragment within MF_F to merge with. It ensure the formation of star structures, preventing long chains.

Summary:

The Controlled-GHS algorithm builds upon the GHS by adding control to the growth of the fragments. It runs in phases, where each phase is composed of selecting edges for potential merges and then efficiently merging fragments while maintaining control over their size and shape. The algorithm ensures that fragments don’t grow too large or too stretched out, making it suitable for network applications where balance and efficiency are crucial.

Edge elimination

Objective: Given a fragment graph that represents possible portions of the Minimum Spanning Tree (MST), the algorithm aims to prune away extra interfragment edges and retain only those necessary for the MST.

  • A fragment graph:
    • vertical: a fragments forest
    • edge: connections
  • After pruning, there is only N-1 edges to connect N fragments.

Pipeline algorithm:

  • It is to prune the unnecessary edges between fragements. The algorithm aims to propagate the lightest edges in the graph upwards in the BFS tree while ensuring cycles are not introduced.

  • steps:

    • BFS Tree construction:
      • This tree will serve as the structure for the algorithm, helping to propagate edge information.
      • each node is the orignal node
    • Each node track two sets:
      • Q: all known interfagment edges
      • U: edges inforation which already sent to the parents.
    • Initiating edge propagations:
    • Edge sending machanism:
      • leaf nodes begin the process by sending information to their parents at the very start.
      • Non-leaf nodes (intermediate nodes) will start sending edges only after they have received messages from all their children. This ensures that the node has full knowledge of its sub-tree’s edges before forwarding them upward.
      • if a node can’t find any edge to send (because all are either already sent or would form cycles), it concludes its participation in this edge propagation process and informs its parent that it’s done.
    • The root r(B), after receiving edge information from its children (directly or indirectly), determines the N−1 edges that should be part of the MST of the fragment graph F.

    • After deciding the edges of the MST, the root broadcasts this information over the BFS tree B. Every node in the graph G eventually learns which interfragment edges are part of the MST.
  • In summary, This algorithm efficiently finds the MST by taking advantage of the BFS tree structure. Each node acts as a filtering agent, forwarding only the lightest edges, ensuring that by the time information reaches the root, it mostly contains the desired MST edges. The approach is distributed since each node in the BFS tree independently carries out its role, making it scalable for large networks.

Problems I asked

You’ve touched upon a central aspect of the algorithm’s design, and understanding the distinction is crucial. The goal of the GHS algorithm and its modified version, Controlled-GHS, is to construct a Minimum Spanning Tree (MST) in a distributed setting. The two major components of the process are:

  1. Fragment Construction (using Controlled-GHS) : Here, the network self-organizes into fragments using a decentralized process. Each fragment is a part of the MST. This phase ensures that as the algorithm progresses, these fragments grow in size, and the number of fragments reduces.

  2. Edge Elimination (using the BFS structure) : After having a collection of fragments, we now need to determine which edges between these fragments are part of the MST. The BFS tree structure is a communication mechanism that aids in this decision-making.

To address your question directly:

  • The Controlled-GHS phase is about organizing the network into growing fragments of the MST in a decentralized manner. Without this phase, you would not have well-defined fragments, and determining the MST edges in a distributed system would be more challenging.

  • The BFS structure in the pipeline algorithm is about communication and decision-making . It’s a mechanism to allow nodes to share and propagate information about potential interfragment edges that could be part of the MST.

The reason for not using just the BFS for the entire process is that a BFS tree by itself doesn’t guarantee a minimum spanning tree for the graph. The GHS algorithm ensures that we’re building the MST by focusing on merging fragments based on the lightest edges. The BFS structure is then a means to facilitate the communication needed to finalize the MST by selecting the necessary interfragment edges.

In summary, while both phases involve trees, their purposes are different: one is for constructing parts of the MST in a decentralized manner, and the other is for efficient communication and decision-making about MST edge inclusion.

Please summarize the results of the paper. (What are the main important new results achieved in this paper? What problem was solved? What is novel about the solution?)

The paper introduce a distributed algorithm for finding a minimum-weight spanning tree (a subset of the edges that connects all the vertices together without any cycles and with the minimum possible total edge weight). This algorithm’s time complexity is not linear in n but is linear in Diam. The specific time complexity is O(Diam + n^ϵlogn) with ϵ≈0.6131.

  • The pappaer firstly define the universally optimal: if an algorithm is universally optimal, it pinpoints the exact parameters contributing to the problem’s complexity.
  • The paper then investigate a question: Can we identify inherent graph parameters for various network problems and then develop universally optimal algorithms based on these parameters?
  • Traditioanlly, If an distributed network algorithm’s running time for a network with n vertices can be optimized to O(n), it’s considered a success. The paper then illustrate that the more sensitive parameter to consider is the network’s diameter, termed “Diam”, rather than number of nodes n. Thus, the time complexity can be O(Diam + n^ϵlogn) , and it is faster than O(n)

The paper mainly track the MST problem (Minimum-Weight Spanning Tree), and try to find the solution with univerasally optimal, which is faster than traditional solution which run in O(n), where n is number of verticals.

  • The objective is to build a spanning tree across all the nodes in V that has the minimal total edge weight.
    • A spanning tree of a graph is a tree that spans (reaches out to) all the vertices of the graph and contains all the vertices with the minimum required number of edges.
    • The goal is to construct a tree that connects all the nodes in the set V such that the sum of the weights of the edges in this tree is as small as possible.

The mainly novity of the paper combines two distinct approaches to the distributed constrcution of an MST, namely, Controlled-GHS and Edge Elimination.

  • The Controlled-GHS algorithm builds upon the GHS by adding control to the growth of the fragments. It runs in phases, where each phase is composed of selecting edges for potential merges and then efficiently merging fragments while maintaining control over their size and shape. The algorithm ensures that fragments don’t grow too large or too stretched out, making it suitable for network applications where balance and efficiency are crucial.
  • Once there’s a small number of large fragments, the algorithm moves to the edge elimination phase, where unnecessary edges are removed, leaving behind the Minimum Spanning Tree for the entire network.

Further, the paper provides the full lemma and proof, and all the way to the time complexity computations.

Please justify your overall recommendation.

The paper is very good at illustrating why needs combine the two algorithms. For me, it’s important to understand why it needs a controlled GHS, since all the nodes will be formed as a BFS anyway.

While here is the answer:

The goal of the GHS algorithm and its modified version, Controlled-GHS, is to construct a Minimum Spanning Tree (MST) in a distributed setting. The two major components of the process are:

  1. Fragment Construction (using Controlled-GHS) : The network self-organizes into fragments using a decentralized process. Each fragment is a part of the MST. This phase ensures that as the algorithm progresses, these fragments grow in size, and the number of fragments reduces. The Controlled-GHS phase is about organizing the network into growing fragments of the MST in a decentralized manner. Without this phase, it would not have well-defined fragments, and determining the MST edges in a distributed system would be more challenging.

  2. Edge Elimination (using the BFS structure) : After having a collection of fragments, we now need to determine which edges between these fragments are part of the MST. The BFS tree structure is a communication mechanism that aids in this decision-making. The BFS structure in the pipeline algorithm is about communication and decision-making. It’s a mechanism to allow nodes to share and propagate information about potential interfragment edges that could be part of the MST.

The reason for not using just the BFS for the entire process is that a BFS tree by itself doesn’t guarantee a minimum spanning tree for the graph. The GHS algorithm ensures that we’re building the MST by focusing on merging fragments based on the lightest edges. The BFS structure is then a means to facilitate the communication needed to finalize the MST by selecting the necessary interfragment edges.

In summary, while both phases involve trees, their purposes are different: one is for constructing parts of the MST in a decentralized manner, and the other is for efficient communication and decision-making about MST edge inclusion.

What are the main strengths of this paper? (What does it do very well? What are the best aspects of the paper? What are the main reasons to accept the paper to your conference?)

The algorithm has well and clear assmption, thus provided well scenario where the alrogithm can be adpoted.

  • Every node has s unique identifier, each edge has a distinct weight (can also be identifed by identifer)
  • Diam(F) is the diameter of the subgraph F, which is the maximum unweighted distance (number of hops) between any two nodes in F.
  • Ignore communication costs (# msgs the algorithm uses), and it’s samll cost
  • synchronous computation: run in rounds, each round take one time unit. All msg delivered in next cycle.
    • Even though the paper assumes synchronous computation, they note that they can adopt asynchronous mechanisms due to their decision to ignore communication costs.
  • Constraints:
    • message are size of O(logn), and one node can send one msg per edge in each round. In other words, suggests that the size of the message grows logarithmically with the number of nodes.

Further, the paper clearly and deeply introduce the context, problem, solution, analysis, lemma, and proof.

  • Controlled-GHS algorithm is a modified version of the GHS algorithm, it aims to:

    • Limit the number of fragments upon termination to N.
    • Ensure that throughout its run, the diameter of every fragment F is bounded by d.

    it Run in phases, each phase has two stag:

    • stage 1: selection of minimum weigth outgoing edges:

      For each fragment, this stage identifies the smallest-weight edge that connects it to a different fragment. This step prepares the fragments for a potential merge

    • stage 2: fragment merging:

      • It identifies a dominating set for each tree in the fragment forest, and then, they combined to get MF_f
      • fragments outside MF_f pick a neighboring fragment within MF_F to merge with. It ensure the formation of star structures, preventing long chains.
  • Edge elimination: Given a fragment graph that represents possible portions of the Minimum Spanning Tree (MST), the algorithm aims to prune away extra interfragment edges and retain only those necessary for the MST. And After pruning, there is only N-1 edges to connect N fragments.

    It uses Pipeline algorithm to achieve this:

    • It is to prune the unnecessary edges between fragements. The algorithm aims to propagate the lightest edges in the graph upwards in the BFS tree while ensuring cycles are not introduced.

    • steps:

      • BFS Tree construction:
        • This tree will serve as the structure for the algorithm, helping to propagate edge information.
        • each node is the orignal node
      • Each node track two sets:
        • Q: all known interfagment edges
        • U: edges inforation which already sent to the parents.
      • Edge sending machanism:
        • leaf nodes begin the process by sending information to their parents at the very start.
        • Non-leaf nodes (intermediate nodes) will start sending edges only after they have received messages from all their children. This ensures that the node has full knowledge of its sub-tree’s edges before forwarding them upward.
        • if a node can’t find any edge to send (because all are either already sent or would form cycles), it concludes its participation in this edge propagation process and informs its parent that it’s done.
      • The root r(B), after receiving edge information from its children (directly or indirectly), determines the N−1 edges that should be part of the MST of the fragment graph F.

      • After deciding the edges of the MST, the root broadcasts this information over the BFS tree B. Every node in the graph G eventually learns which interfragment edges are part of the MST.

The paper is solid in problem definatin, solution illustration and proofs.

What are the main weaknesses of the paper? Please give the authors advice for how to improve their paper. (What are the least good aspects of the paper? What are the main reasons to reject the paper?)

The paper didn’t provide a good application example to illustrate how much improvement in terms of latency or throughput of the algorithm. Providing a detailed use case or conduct on real datasets will help reader to understand their improvements much more deeply.

What can you learn from this paper about how to write a good research paper?

Firstly, it’s important to define good problems and also illustrate what existing work is missing in solving some problems.

Secondly, even existing work claims it’s a success to have O(n) in solving the distributed networking problems. It sill needs to think deeper to find a new direction to solve the problem

Thirdly, if we need to combine algorithms, it’s important to illustrate why missing one will not work and why both are necessary





END OF POST




Tags Cloud


Categories Cloud




It's the niceties that make the difference fate gives us the hand, and we play the cards.