Union-Find Algorithm

https://songstudio.info/tech/tech-17/

In above “Minimum Spanning Tree” post, I simply mentioned that “Union-Find” algorithm is used when we check if a selection of certain edge or vertex leads to the formation of a cycle.

So today’s main topic is Union-Find algorithm.

Union-Find is an algorithm methodology that unites or finds elements in a disjoint set representation.

Disjoint set means that given a set, all subsets of it must not have common element.

For example, if we have $U=\{ 1,2,3,4,5,6,7 \}$, then disjoint set representation can be $(\{1,2,3\}, \{4,5\}, \{6,7\})$, $(\{1,2\}, \{3,4\}, \{5,6,7\})$, or $(\{1\}, \{2\}, \{3\}, \{4\}, \{5\}, \{6\}, \{7\})$ etc.

The Union-Find algorithm is for making new disjoint set by unifying two disjoint subsets or finding a set which certain element is contained in.

So when making a MST, we can handle the cycle formation by putting vertices we chose in a same set named MST and after selecting one edge, verifying that one of two vertices included in that edge is in MST also.

Basic Union-Find algorithm

Now, to explain the details of Union-Find, I’m gonna assume that all disjoint sets are represented with tree structures.

That is, a single subset is represented with a single tree, which contains each element in the set as a node.

Here is a description.

Above structure means that there is a set containing entire elements, $U = \{1,2,3,4,5,6,7\}$ and two disjoint subsets, $\{1,2,3,4,5\}, \{6,7\}$.

The point is that each node has an edge connected with its direct parent.

For example, $4$ and $5$ know that their direct parent is $3$, and $7$ knows its direct parent, $6$.

$1$ and $6$ are top nodes, which do not have any parent, so they are just pointing themselves.

Now, there are three parts for Union-Find algorithm implementation.

First we need to set initial state and initial tree structures.

I’m gonna use a simple array.

We have elements from $1$ to $7$ and we assume that initially all elements have their own sets, which have only one element.

So each index in the array should have a value same with the index.

int tree[8];
for (int i=1; i<=7; ++i) {
	tree[i] = i;
}

We have two main functions, called $Union$ and $Find$.

The former is a function to make two sets into one and the latter is a function that given an element, finds its top node in the tree representing the set including it.

First, this is the function $Union$ that gets two elements as parameters, finds each top root and combines two sets by making one of two roots become the other’s child.

void Union(int a, int b) {
    int root_a = Find(a);
    int root_b = Find(b);

    if (root_a != root_b) {
        tree[root_b] = root_a;
    }
}

For example, starting with the initial state, we can make above state described in the image with following sequence.

Next, this is the $Find$ function that gets an element as an input, follows its ancestors until it reaches the root node and return it.

int Find(int a) {
    if (tree[a] == a) {
        return a;
    } else {
        return Find(tree[a]);
    }
}

Let’s calculate each time complexity.

The time complexity of $Union$ is bounded by $Find$ and $Find$’s running time is bounded by the height of a tree.

So if the number of elements is $N$, it might be $O(logN)$, but in the worst case the running time would be $O(N)$ if all nodes are connected sequentially.

To adjust this limitation, the modified $Find$ function is usually used.

Optimized Union-Find algorithm

We can reduce the overall running time by compressing path in a tree.

The path compression can be implemented by modifying $Find$ function.

If we get a parameter $a$, we make all ancestors of $a$ a direct children of the root while tracking goes on.

For example, path compression can be applied as following.

Here is the modified code for optimized $Find$ function.

int Find(int a) {
    if (tree[a] == a) {
        return a;
    } else {
        tree[a] = Find(tree[a]);
        return tree[a];
    }
}

The only difference is the line 5, which calls $Find$ function recursively putting the direct parent of $a$ as a parameter.

Intuitively, it eventually reaches the root of the tree, assigns it into direct parent of all elements the function meets.

Therefore, the process like above image can be conducted.

Then what about the time complexity?

Unfortunately, I couldn’t find the actual explanation about it because it cannot be specified as an exact value.

Some posts or people say that it is $O(1)$, but as we can see, the time becomes shorter and shorter while executing the $Find$ function starting from $O(N)$, so we cannot say that it’s running time is bounded by $O(1)$.

As I heard from my alumni, it is substantially small number but not exactly constant time.

So in the system of time we are in now, we can just assume that it is quite short.

I will omit more detailed calculations as they are beyond this post.

This is a basic problem that can be solved with Union-Find.

https://www.acmicpc.net/problem/1717

The following is the answer code for above problem.

#include <iostream>

using namespace std;

int n, m;
int tree[1000001];

int Find(int a) {
    if (tree[a] == a) {
        return a;
    } else {
        tree[a] = Find(tree[a]);
        return tree[a];
    }
}

void Union(int a, int b) {
    int root_a = Find(a);
    int root_b = Find(b);

    if (root_a != root_b) {
        tree[root_b] = root_a;
    }
}

int main() {

    scanf("%d %d", &n, &m);

    for (int i=0; i<=n; ++i) {
        tree[i] = i;
    }

    for (int q=1; q<=m; ++q) {
        int p, a, b;
        scanf("%d %d %d", &p, &a, &b);

        if (p == 0) {
            Union(a, b);
        } else {
            int root_a = Find(a);
            int root_b = Find(b);
            if (root_a == root_b) {
                printf("YES\n");
            } else {
                printf("NO\n");
            }
        }
    }

    return 0;
}

Categories