[PATCH] D50433: A New Divergence Analysis for LLVM

Thu Aug 9 08:16:48 PDT 2018

simoll added a comment.

Thanks for sharing :) I think our approaches are more similar than you might think:

It's worth keeping in mind that the disjoint path problem is symmetric. That is if there are two disjoint paths from A to Z then there are also two disjoint paths through the reversed edges from Z to A.
What that means is that it does not matter whether you use control dependence (post dominance frontiers) or dominance frontiers to detect join points.

In https://reviews.llvm.org/D50433#1193813, @alex-t wrote:

> You handle the divergence induced by the divergent branches mapping the branch to the set of PHIs. In other words: you compute the PHIs control-dependent of the branch when you encounter the branch that is divergent.
>  There could be another way. As you know, all BasicBlocks on which the given block B is control dependent belongs to B's post-dominance frontier. So, for given PHI node we can easy know the set of branches on which this PHI is control-dependent.

The advantage of the dominance-based approach is that it aligns well with the flow of the DA:
When the DA detects that a branch is divergent the next question is which phi nodes (join points) are sync dependent on that branch.
In the dominance-based approach that we take, you can compute that set lazily (which is exactly what we do) since we always start from the branch. This implies that (apart from the RPOT) there is zero pre-processing overhead in the SyncDependenceAnalysis if the kernel/loop/.. does not contain divergent branches. As a plus you never need to iterate over the predecessors of basic blocks (which is slow).
On the other hand, the control-dependence based approach starts from the join points and tracks back to divergence-inducing branches. In that flow, you have to compute the sync dependence relation up-front to be able to invert it whenever a branch becomes divergent. This is what you facilitate by construction a control-dependence graph and tagging the PHI nodes with extra operands (more on that later).

One more observation: using the unfiltered (post-)dominance frontier is overly conservative. That is because a block can become control-dependent on a branch from which there are no two disjoint paths to the block., e.g.:

        A
      / |
    B   |
   /  \ |
  C    D

D is control-dependent on B. However, B can not induce divergence in PHI nodes in D since all threads reaching from B will select the same incoming value.

> Also, there is one more observation:  the DA itself is the canonical iterative solver upon the trivial laticce {divergent, unknown, uniform}. Given that the instruction is divergent immediately if it has the divergent operand. The "bottom" element of the laticce being joined to any produces "bottom" (divergent) again. So we have restricted ordered set and descending function and as a result the fixed point. Sorry for repeating the trivial things - just to explain the idea better...

That's exactly the algorithm we implement in this DA. Membership of a value in the `DA::divergentValue` set means that it's assigned the 'divergent' lattice element. There is no `unknown` element atm. We assume that non-divergent values are uniform.

> Let's consider the PHI as operation which has extra operands - the join of the usual PHIs operands and the set of the all branches on which this PHI is control dependent.
>  Now we can process the PHI in usual solving loop as any other instruction computing it's divergence as the minimum over all operands.
> 
> Usual op:   D = MIN (Opnd0, Opnd1, .... OpndN)
>  PHI:            D = MIN(Opnd0, Opnd1, .... OpndN,  ControlDepBranch0, ControlDepBranch1 ......   ControlDepBranchN)

Same idea here. However, our approach is two staged:
If a basic block is in the set `DA::divergentJoinBlocks` it means that it has the `divergent` lattice element.
In `DA::updatePHInode`, we join in the lattice element of the parent block of the phi node (`DA::isJoinDivergent`).
Why two stages? If the branch becomes divergent, the DA receives the set of all join points from the SyncDependenceAnalysis, marks all those blocks as join divergent and queues the (yet non-divergent) phi nodes in those blocks.
When the phi node are updated later on they take in their parent's block join divergence as an additional operand to their update function.

> This algorithm assumes:
> 
> 1. SSA form
> 2. We operate on both instructions and operands as Values and we have a mapping Value => Divergence i. e.    divergence[V] = {1|0}

We do the same in our vectorizer, RV https://github.com/cdl-saarland/rv, where each value maps to a complex lattice element with stride and alignment information. This implementation is based on RV. However, for this patch we tried to stay close to the existing (Legacy)DivergenceAnalysis and followed the set-based approach for lattice encoding.

> 3. We have post-dominance frontiers sets pre-computed for all BasicBlocks in Function.

As you see, that's not actually necessary.

> This approach works pretty good in AMD HSAIL compiler.
>  Since it employs iterative analysis it works even the reversed CFG is irreducible but takes more iteration to reach the fixed point.

Yep, the same applies to this SyncDependenceAnalysis. Simply run the SyncDependenceAnalysis in a fixed point loop.
We did not implement this yet to keep things simple.

Repository:
  rL LLVM

https://reviews.llvm.org/D50433