[PATCH] D50433: A New Divergence Analysis for LLVM

Fri Aug 24 02:04:55 PDT 2018

simoll added a comment.

In https://reviews.llvm.org/D50433#1210817, @alex-t wrote:

> When we discover the divergent control flow things tend to change. For each divergent branch we have to propagate the divergence computing the joins. For that we have to at least walk the successors until the immediate post dominator.  We literally do same job as we'd done constructing the SSA form! And with the loops this is much more complicated.

In https://reviews.llvm.org/D50433#1210920, @alex-t wrote:

> BTW, even with analyzing forward (up-down) and lazy style, for each divergent terminator you have to compute join points.
>  This is exactly what is done constructing SSA form to find all the blocks where should be PHI nodes.

Yes, this is *almost* SSA construction as discussed in the comment in `SyncDependenceAnalysis.cpp`.
There is one important difference: we do not require a dominating definition (unlike SSA). That's shown in the example in the same source comment.

> For given branch all the blocks where PHI nodes must be inserted belongs to the branch parent block dominance frontier.

The problem with DF is that it implicitly assumes that there exists a definition at the function entry (see http://www.cs.utexas.edu/~pingali/CS380C/2010/papers/ssaCytron.pdf, comment below last equation of page 17).
So, we would get imprecise results.

> Why don't you use at least DF information from PDT?
>  [..]
>  Facing the divergent branch you can compute the blocks set affected by the divergence in linear time by Cooper's "two fingers" algorithm.

Control-dependence/PDT does not give you correct results (see my earlier example in https://reviews.llvm.org/D50433#1205934).

Now, we could use the DF of DT as it's often done in SSA construction.
When propagating a definition at a block B in the SDA we could skip right to the fringes of the DF of block B; there won't be any joins within the DF of B.
That does not fundamentally alter the algorithm - we would iterate over the DF blocks of B instead of its immediate successors.. that's it.
In a way, we do this already when a definition reaches the loop header from outside the loop: in that case we skip right to the loop exits, knowing that the loop is reducible and so the loop header dominates at least all blocks inside the loop.

> I do not consider this as a serious issue. I just noticed that there is a lot of code that computes the information that has already been computed once before.

Using DF could speed up the join point computation at the cost of pre-computing the DF for the region. It really comes down to the numbers. Are there any other users of DF in your pipeline that would offset its cost?
If not i would suggest planning lazy DF computation for a future revision of the DivergenceAnalysis in case SDA performance should ever become an issue.

Repository:
  rL LLVM

https://reviews.llvm.org/D50433