[LLVMdev] dominance frontiers

Fri Jan 6 22:40:31 PST 2012

On Sat, Jan 7, 2012 at 12:14 AM, Cameron Zwarich <zwarich at apple.com> wrote:
> On Jan 6, 2012, at 8:27 PM, Daniel Berlin wrote:
>
> Note: GCC takes exactly the same approach as LLVM here, for exactly
> the reason chris specifies.
> In fact, until we started local SSA updating (which is now many years
> ago, but ...), dominance frontier calculation for ssa updating was in
> the top 10 profile functions for GCC compiles of large source files.
> I had tried a number of the linear time phi placement algorithms
> instead, but sadly, none were fast enough in practice (I believe
> someone later published a paper comparing them all and finding the
> same thing).
>
>
> I wrote the current implementation of the up-front SSA construction in LLVM,
> found in Utils/PromoteMemoryToRegister.cpp. It uses a modified version of
> the Sreedhar-Gao algorithm from "A linear time algorithm for placing
> phi-nodes". The most interesting modification is that it uses per-variable
> liveness information to compute pruned SSA by pruning the entire IDF
> computation, rather than merely pruning the DF -> IDF step like is usually
> done. I was a bit surprised that it worked so well, given the reputation of
> the "linear time" algorithms, but it performed the best out of everything I
> attempted.
Interesting. Maybe some day i'll give it a try again.
I swear I had tried this one (I was always a fan of Sreedhar's work),
however, but it's entirely possible i messed up the implementation :)

>
> As a completely random aside, the algorithm given in the original
> paper you cited is actually not quite the best way in practice.  It
> performs more comparisons than strictly necessary. When I moved us to
> the algorithm found in http://www.cs.rice.edu/~keith/Embed/dom.pdf,
> Figure 5, it was ~30% faster (AFAIK, Jeanne Ferrante is somewhat
> responsible for both algorithms :P). It is also a lot simpler to
> explain, IMHO.
>
>
> There is a followup paper from other authors (including Tarjan himself) that
> discusses some implementation tricks for the Lengauer-Tarjan algorithm:

Ah, i was referring to the *dominance frontier* computation, not the
dominator computation.
If you look, you'll see there is an algorithm in the paper that
computes dominance frontiers touching only the nodes that are
*actually in* the dominance frontier :)

The algorithm in the cytron/et al paper looks like this:

for each X in a bottom-up traversal of the dominator tree do
  DF(X) = empty
  for each Y in Successors(X)
    if (idom(Y) != X) then DF(X) += Y
  for each Z in DomChildren(X)
   for each Y in DF(Z)
     if (idom(Y) != X then DF(X) += Y

You can see that this does more comparisons than strictly necessary.
OTOH, the algorithm by Ferrante that Harvey gives is:

for each B in all basic blocks
  if the length of Predecessors(B) >= 2
    for each P in Predecessors(B)
     runner = P
     while runner != idom[B]
      DF(runner) += B
      runner = idom[runner]

You can see this does a lot less work by skipping a lot of useless nodes, etc.

It's also a lot simpler to explain :)

>
> http://www.cs.princeton.edu/~rwerneck/docs/GWTTA04.pdf
>
> They cite personal communication from Keith Cooper that a later more careful
> implementation of Lengauer-Tarjan led to different benchmark results than
> what they published. I added those tricks and some others to LLVM's
> implementation of Lengauer-Tarjan (the simple O(n log n) version) for a
> significant performance improvement.

My only work on LLVM's dominators was when I successfully made a mess
of the dominator tree implementation in LLVM way back when by using a
splay-tree structure that stored the sequences, rather than BB's.   It
is technically faster to update, but in practice wildly complex to
maintain.
(See rev 25144)
:)

> when I should do a comparison with a careful
> implementation of the Cooper-Harvey-Kennedy algorithm some day.

We have always found the simple version of Lengauer-Tarjan + the
various tricks you are thinking of to beat out the
Cooper-Harvey-Kennedy algorithm.

We used to use the complex version, but as you also found, it was not
faster in practice than the O(n log n) + good engineering, and it was
a heck of a lot harder to maintain.

>
> Cameron