[PATCH] D40427: [ADT] Introduce Disjoint Set Union structure

Daniel Berlin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sun Nov 26 23:45:12 PST 2017


dberlin added a comment.

Two things.
First, to bikeshed this horribly:

Everywhere else in llvm we call this a union-find data structure.
Almost all papers you can reference these days do the same.
What you call head is called find roughly everywhere, and we should do the same here.

See, e.g., 
https://pdfs.semanticscholar.org/bbcf/76a84ee10348442ccb50ccdbfb288ede5cbb.pdf (hopcroft and ullman's analysis bounding it to log*)
https://dl.acm.org/citation.cfm?doid=62.2160 (Tarjan's bounding of union-find to inverse ackermann)
https://pdfs.semanticscholar.org/b716/349a3072afbede9f0fb8f561a8e0f297baf0.pdf (survey of data-structures used to solve disjoint-set-union problems)

Even wikipedia calls it find() :)

(we call it find in all of the llvm implementations)

Second:

EquivalenceClasses.h already implements this datastructure,  but not the union by rank. It does do path compression.

We should not end up with two. I'm fine if we go with your implementation, but the end result (doesn't have to be in this patch) should be only one of these classes existing.



================
Comment at: include/llvm/ADT/DisjointSetUnion.h:84
+      // recursively for all chain of parents.
+      return Parent[X] = head(It->second);
+    // Every vertex that doesn't have a parent is the head of its set.
----------------
mkazantsev wrote:
> sanjoy wrote:
> > Please avoid recursion here, unless you're certain this would be (say) less than 10 frames for all practical cases (in which case add an assert).
> I'm pretty certain that the expected depth is effectively small, but I was also thinking to rewrite this with loop, so I'll do it.
I would not bother making this non-recursive (I can't recall ever having seen a production implementation that is non-recursive)

Since you are doing union-by-rank, depth is limited to log(total number of items).
IE any root node of rank X must have >= 2**X items under it.

The worst case is if you only ever call find after all of the unions, and then you have log(n) recursion worst case here.For For LLVM ,it's probably bounded in the millions for most practical problems, so somewhere between 20-24 is my guess at worst case, assuming 16 million items.

If you call find, and intersperse the two, you will pretty much never get a depth > 5.
(amortized, it can only be >5 if you have more than 2**65535 items
non-amortized, very hard to say, but my guess is <= 10 for most problem types)

We also do this recursively in the other implementations of this datastructure we have (for example, EquivalenceClasses and AliasSetTracker).




https://reviews.llvm.org/D40427





More information about the llvm-commits mailing list