Union-Find

Medium
Time: O(alpha(n)) ~ O(1) amortized per operationSpace: O(n)
0/6
problems solved
0%
1.

Pattern Overview: What is Union-Find?

Union-Find (also called Disjoint Set Union or DSU) is a data structure that tracks elements partitioned into disjoint sets.

Two Core Operations:

  • Find(x): Which set does x belong to? (Returns the root/representative)
  • Union(x, y): Merge the sets containing x and y

Visual Representation:

Initial: Each element is its own set
[0] [1] [2] [3] [4] (5 components)
After union(0,1) and union(2,3):
[0-1] [2-3] [4] (3 components)
After union(1,3):
[0-1-2-3] [4] (2 components)

When to Use Union-Find:

  • Dynamic connectivity (are X and Y connected?)
  • Counting connected components
  • Detecting cycles in undirected graphs
  • Grouping related items (accounts, friends)
  • Kruskal's MST algorithm

Key Insight: Union-Find answers connectivity queries efficiently without storing the actual graph structure.

2.

Basic Implementation: Parent Array

The core idea: use a parent array where parent[i] points to i's parent. The root of a set points to itself.

Initial State:

parent = [0, 1, 2, 3, 4]
↓ ↓ ↓ ↓ ↓
0 1 2 3 4 (each is its own root)

After union(1, 2):

parent = [0, 1, 1, 3, 4]
2 now points to 1
Tree: 0 1 3 4
|
2

Find Operation: Follow parent pointers until reaching root (where parent[x] == x)

Code
Java
3.

Optimization 1: Path Compression

Problem: Without optimization, find() can be O(n) if tree becomes a long chain.

Solution: Path compression - while finding root, make all nodes point directly to root.

Before Path Compression:

0 find(4) traverses: 4→3→2→1→0
| Time: O(n)
1
|
2
|
3
|
4

After Path Compression on find(4):

0 All nodes now point directly to root!
/ | \ \ Next find(4): O(1)
1 2 3 4

Implementation: During find, update each node's parent to the root.

Code
Java
4.

Interactive: Union-Find Operations Visualizer

Watch how union and find operations work with path compression. See trees merge and flatten in real-time.

Union-Find Operations

Watch union and find with path compression

Speed:
Operations:
union(0, 1)
union(2, 3)
union(4, 5)
union(1, 3)
find(4)
union(3, 5)
find(0)
0root1root2root3root4root5root
parent[] array:
0
0
1
1
2
2
3
3
4
4
5
5
Click Play to see Union-Find operations
Root
Active
Path
5.

Optimization 2: Union by Rank

Problem: Even with path compression, bad union order creates tall trees.

Solution: Always attach the shorter tree under the taller one.

Union by Rank:

Bad: Attaching tall to short Good: Attaching short to tall
1 1 1
| + / \ = / | \
2 3 4 2 3 4
| |
5 5
Height: 4 Height: 2

Rank: Approximate height of tree (upper bound due to path compression)

Code
Java
6.

Counting Connected Components

A common use case: track the number of disjoint sets.

Approach: Start with n components, decrease by 1 on each successful union.

Example: Number of Provinces

Input: isConnected = [[1,1,0],[1,1,0],[0,0,1]]
Cities: 0, 1, 2
Connections: 0-1 connected, 2 alone
Initial: 3 components [0] [1] [2]
union(0,1): 2 components [0-1] [2]
Answer: 2 provinces
Code
Java
7.

Interactive: Connected Components Visualizer

Watch how union operations merge components and reduce the component count.

Connected Components

Watch components merge as edges are processed

Speed:
Components: 6
012345
Edges to process:
(0, 1)
(1, 2)
(3, 4)
(4, 5)
(2, 5)
parent[] array:
0
0
1
1
2
2
3
3
4
4
5
5
Click Play to merge components

Key Insight: Start with n components. Each successful union decreases count by 1. Nodes with same color belong to same component.

8.

Cycle Detection in Undirected Graphs

Key Insight: If we try to union two nodes that are ALREADY connected, we've found a cycle!

Algorithm:

  1. Process each edge (u, v)
  2. If find(u) == find(v), adding this edge creates a cycle
  3. Otherwise, union(u, v)

Example: Redundant Connection

Edges: [[1,2], [1,3], [2,3]]
Process [1,2]: find(1)=1, find(2)=2 → different, union(1,2)
Process [1,3]: find(1)=1, find(3)=3 → different, union(1,3)
Process [2,3]: find(2)=1, find(3)=1 → SAME! Cycle found!
Return [2,3] (redundant edge)
Code
Java
9.

Union-Find on 2D Grids

For grid problems, flatten 2D coordinates to 1D index.

Index Mapping: (row, col) → row * numCols + col

Example: Number of Islands using Union-Find

Grid: Index:
1 1 0 0 1 2
1 0 0 → 3 4 5
0 0 1 6 7 8
Process land cells, union with neighbors.
Count remaining components with land.
Code
Java
10.

Union by Size: Tracking Component Sizes

Sometimes you need to track the SIZE of each component, not just connectivity.

Use Cases:

  • Find largest connected component
  • Making A Large Island (flip 0 to maximize island)
  • Social network largest friend group

Key Insight: Store size at root, update on union.

Example:

Initial: [1] [2] [3] [4] sizes: [1,1,1,1]
union(1,2): [1-2] [3] [4] size of root 1 = 2
union(3,4): [1-2] [3-4] size of root 3 = 2
union(1,3): [1-2-3-4] size of root 1 = 4
Code
Java
11.

Accounts Merge: Union-Find with String Keys

Problem: Merge accounts that share common emails.

Challenge: Emails are strings, but Union-Find uses integers!

Solution: Map emails to indices, union by shared emails.

Example:

Input:
[["John", "john@mail.com", "john_work@mail.com"],
["John", "john@mail.com", "john00@mail.com"],
["Mary", "mary@mail.com"]]
Step 1: Map emails to indices
john@mail.com → 0
john_work@mail.com → 1
john00@mail.com → 2
mary@mail.com → 3
Step 2: Union emails in same account
Account 0: union(0, 1) // john@ with john_work@
Account 1: union(0, 2) // john@ with john00@
Account 2: no union needed (only one email)
Step 3: Group by root, attach to owner
Root 0 owns: [john@, john_work@, john00@] → John
Root 3 owns: [mary@] → Mary
Code
Java
12.

Making A Large Island

Problem: Given a grid of 0s and 1s, flip at most one 0 to 1 to maximize island size.

Approach:

  1. Use Union-Find to find all islands and their sizes
  2. For each 0, check adjacent island roots
  3. Calculate potential size if we flip this 0
  4. Track maximum

Example:

1 0 Islands: A(size 1), B(size 2)
1 1
Flip (0,1):
Adjacent islands: A (up-left via (0,0)), B (down via (1,1))
New size = 1 + sizeA + sizeB = 1 + 1 + 2 = 4
1 1
1 1 → All connected, size 4

Edge Case: Grid is all 1s → return m * n

Code
Java
13.

Smallest String With Swaps

Problem: Given string s and pairs of indices that can be swapped any number of times, return lexicographically smallest string.

Key Insight: If you can swap (a,b) and (b,c), then a,b,c are all connected and can be rearranged freely!

Approach:

  1. Union all swap pairs
  2. Group indices by their root
  3. For each group, sort characters and assign smallest to smallest index

Example:

s = "dcab", pairs = [[0,3],[1,2]]
Groups after union:
Root 0: indices [0, 3] → chars ['d', 'b']
Root 1: indices [1, 2] → chars ['c', 'a']
Sort each group:
Group 0: indices [0, 3], sorted chars ['b', 'd']
Group 1: indices [1, 2], sorted chars ['a', 'c']
Assign smallest char to smallest index:
result[0] = 'b', result[3] = 'd'
result[1] = 'a', result[2] = 'c'
Result: "bacd"
Code
Java
14.

Kruskal's MST Algorithm

Problem: Find Minimum Spanning Tree - connect all nodes with minimum total edge weight.

Kruskal's Algorithm + Union-Find:

  1. Sort edges by weight (ascending)
  2. For each edge, if it connects two different components, add it
  3. Stop when we have n-1 edges

Why Union-Find? We need to quickly check if two nodes are already connected.

Example:

Edges: (A,B,4), (A,C,2), (B,C,1), (B,D,5), (C,D,8)
Sorted: (B,C,1), (A,C,2), (A,B,4), (B,D,5), (C,D,8)
Process:
(B,C,1): B,C not connected → add, weight=1
(A,C,2): A,C not connected → add, weight=3
(A,B,4): A,B already connected (via C) → skip
(B,D,5): B,D not connected → add, weight=8
MST: {(B,C), (A,C), (B,D)}, total weight = 8
Code
Java
15.

Number of Islands II (Dynamic)

Problem: Given grid dimensions and positions added as land one at a time, return count of islands after each addition.

Challenge: Grid changes dynamically - DFS would recompute everything each time!

Union-Find Advantage: O(1) per operation vs O(m*n) for DFS.

Approach:

  1. Start with count = 0
  2. For each new land position:
    • Increment count (new island)
    • Check 4 neighbors, union if neighbor is land
    • Each successful union decrements count
  3. Record count after each addition

Example:

m=3, n=3, positions = [[0,0],[0,1],[1,2],[2,1]]
Add (0,0): count=1 [1 0 0]
[0 0 0]
[0 0 0]
Add (0,1): count=1 [1 1 0] (merged with (0,0))
[0 0 0]
[0 0 0]
Add (1,2): count=2 [1 1 0]
[0 0 1]
[0 0 0]
Add (2,1): count=3 [1 1 0]
[0 0 1]
[0 1 0]
Output: [1, 1, 2, 3]
Code
Java
16.

Satisfiability of Equality Equations

Problem: Given equations like ["a==b", "b!=c"], determine if all can be satisfied.

Key Insight:

  • "==" means variables are in same group
  • "!=" means variables must be in different groups

Two-Pass Approach:

  1. First pass: Process all "==" equations (union)
  2. Second pass: Verify all "!=" equations (should have different roots)

Why two passes? Must build all equality relationships before checking inequalities.

Example:

["a==b", "b==c", "a!=c"]
Pass 1: union(a,b), union(b,c) → all in same group
Pass 2: a!=c? But find(a) == find(c)! → UNSATISFIABLE
["a==b", "b!=c", "c==a"]
Pass 1: union(a,b), union(c,a) → all in same group
Pass 2: b!=c? But find(b) == find(c)! → UNSATISFIABLE
["a==b", "c==d", "a!=d"]
Pass 1: union(a,b), union(c,d) → two groups
Pass 2: a!=d? find(a)=0, find(d)=2, different! → SATISFIABLE
Code
Java
17.

Most Stones Removed with Same Row or Column

Problem: Stones on 2D plane. Remove stone if another stone shares its row OR column. Return max stones removable.

Key Insight: Stones in same row/column form connected components. Within each component of size k, we can remove k-1 stones (keeping 1).

Answer: total_stones - number_of_components

Mapping Challenge: Row 0 and Column 0 would collide! Solution: Offset columns: use row for rows, col + 10001 for columns.

Example:

Stones: [[0,0],[0,1],[1,0],[1,2],[2,1],[2,2]]
Connections:
(0,0) connects to (0,1) via row 0
(0,0) connects to (1,0) via col 0
(0,1) connects to (2,1) via col 1
(1,2) connects to (2,2) via col 2
(2,1) connects to (2,2) via row 2
All 6 stones form 1 component!
Removable = 6 - 1 = 5
Code
Java
18.

Common Edge Cases and Pitfalls

Edge Cases to Handle:

1. Self-Connection

union(x, x); // Should return false (no-op)
find(x) === find(x); // Always true

2. Empty Input

const uf = new UnionFind(0); // Handle gracefully

3. 1-Indexed vs 0-Indexed

// Problem uses 1-indexed nodes
const uf = new UnionFind(n + 1); // Allocate extra space
// Don't use index 0

4. Duplicate Operations

union(1, 2); // First time: returns true
union(1, 2); // Second time: returns false (already connected)
union(2, 1); // Same: returns false

5. Grid Coordinate Mapping

// Correct: row * numCols + col
const idx = row * n + col;
// WRONG: forgetting to use numCols
const idx = row * m + col; // If m != n, this breaks!

Common Mistakes:

MistakeConsequence
Forgetting path compressionO(n) instead of O(α(n))
Not initializing parent[i] = iGarbage values
Using union() result as connectivity checkunion() changes state!
Forgetting to handle duplicate positionsDouble counting
Row/column collision in stone problemsWrong component count
19.

Pattern Recognition Guide

When to Use Union-Find:

Problem TypeApproach
Connected componentsCount after all unions
Are X and Y connected?connected(x, y)
Cycle detectionUnion returns false = cycle
Grouping itemsUnion related items
Dynamic connectivityUnion for add, find for query
MST (Kruskal)Sort edges, union if different components
SatisfiabilityTwo-pass: union equalities, check inequalities

Union-Find vs DFS/BFS:

AspectUnion-FindDFS/BFS
Connectivity queriesO(α(n)) ≈ O(1)O(V+E) each
Build structureO(E × α(n))O(V+E)
Dynamic updatesEfficientRebuild needed
Path findingNoYes
Cycle in undirectedYes (elegant)Yes
Cycle in directedNoYes (need DFS)

Decision Framework:

Need to find actual path? → DFS/BFS
Many connectivity queries? → Union-Find
Graph changes over time? → Union-Find
Directed graph cycles? → DFS
Minimum spanning tree? → Union-Find (Kruskal)
Merging groups? → Union-Find

Problem Patterns:

Number of Provinces → Basic component count
Redundant Connection → Cycle detection
Accounts Merge → String key mapping
Making A Large Island → Size tracking + simulation
Smallest String Swaps → Group by root, sort within
Min Cost Connect Points → Kruskal's MST
Number of Islands II → Dynamic grid
Satisfiability → Two-pass approach
20.

Complexity Analysis

With Both Optimizations (Path Compression + Union by Rank):

OperationTime Complexity
find(x)O(α(n)) ≈ O(1)
union(x, y)O(α(n)) ≈ O(1)
connected(x, y)O(α(n)) ≈ O(1)

α(n) = Inverse Ackermann function

  • For all practical n (up to 10^80), α(n) ≤ 4
  • Essentially constant time!

Space Complexity: O(n)

  • parent array: O(n)
  • rank/size array: O(n)

Without Optimizations:

  • find(): O(n) worst case (long chain)
  • union(): O(n) (calls find)

Comparison:

1 million operations on 1 million elements:
No optimization: ~1 trillion operations
With optimizations: ~4 million operations
That's 250,000x faster!

Problem-Specific Complexity:

ProblemTimeSpace
Number of ProvincesO(n² × α(n))O(n)
Accounts MergeO(NK × α(NK))O(NK)
Making A Large IslandO(n² × α(n²))O(n²)
Kruskal's MSTO(E log E)O(V)
Number of Islands IIO(K × α(mn))O(mn)

Why Path Compression Works: Every find() flattens part of the tree. Over m operations, total work is O(m × α(n)), meaning each operation is O(α(n)) amortized.

21.

Template Summary

Complete Union-Find Template:

class UnionFind {
constructor(n)
parent[i] = i for all i
rank[i] = 0 (or size[i] = 1)
count = n
find(x)
if parent[x] != x:
parent[x] = find(parent[x]) // Path compression
return parent[x]
union(x, y)
px, py = find(x), find(y)
if px == py: return false
// Union by rank/size
attach smaller under larger
count--
return true
connected(x, y)
return find(x) == find(y)
}

When to Add Size Tracking:

  • Need largest component
  • Need component size queries
  • Making A Large Island type problems

Grid Problems Setup:

index = row * numCols + col
UnionFind(m * n)

String Key Problems:

1. Map strings to integers
2. Use standard Union-Find
3. Group results by root

Cycle Detection:

for each edge (u, v):
if find(u) == find(v):
cycle found!
else:
union(u, v)

Two-Pass Pattern:

1. Process all connections (unions)
2. Verify all constraints (finds)

Interview Tips:

  • Always implement with both optimizations
  • Return boolean from union() for flexibility
  • Track count for component problems
  • Draw the tree structure when debugging
  • Remember: find() has side effects (path compression)

Test Your Knowledge

Take a quick 15question quiz to test what you've learned.