[PATCH] D50433: A New Divergence Analysis for LLVM

Fri Aug 10 03:56:35 PDT 2018

alex-t added a comment.

In https://reviews.llvm.org/D50433#1193877, @simoll wrote:

>

> The advantage of the dominance-based approach is that it aligns well with the flow of the DA:
>  When the DA detects that a branch is divergent the next question is which phi nodes (join points) are sync dependent on that branch.
>  In the dominance-based approach that we take, you can compute that set lazily (which is exactly what we do) since we always start from the branch. This implies that (apart from the RPOT) there is zero pre-processing overhead in the SyncDependenceAnalysis if the kernel/loop/.. does not contain divergent branches. As a plus you never need to iterate over the predecessors of basic blocks (which is slow).
>  On the other hand, the control-dependence based approach starts from the join points and tracks back to divergence-inducing branches. In that flow, you have to compute the sync dependence relation up-front to be able to invert it whenever a branch becomes divergent. This is what you facilitate by construction a control-dependence graph and tagging the PHI nodes with extra operands (more on that later).

Yes, my approach requires pre-computation of post-dominance sets. Nevertheless, no predecessor walk is necessary. I compute the PDT sets using Cooper's "2 fingers" algorithm that uses linear walk of the post-dominator tree.
Given that the post-dominator tree is already used by the previous passes and has been constructed before the overhead is relatively small.

> One more observation: using the unfiltered (post-)dominance frontier is overly conservative. That is because a block can become control-dependent on a branch from which there are no two disjoint paths to the block., e.g.:
> 
>         A
>       / |
>     B   |
>    /  \ |
>   C    D
> 
> 
> D is control-dependent on B. However, B can not induce divergence in PHI nodes in D since all threads reaching from B will select the same incoming value.

In fact I don't use non-filtered PDT as well. I should have describe the algorithm in more details.
I really use the set difference between the value source block and PHIs parent block.
Namely:

additional "operands" set is computed as the join

for each PHI incoming value we consider the value source block post-dominance frontier. We take from it only blocks that are NOT in the post-dominance frontier of the join (PHIs) block.
That is the way to exclude those divergent branches that are common between the value source block and the join block.

The result set is the JOIN of the all input values difference sets.

Unfortunately, I cannot paste here the formal definition that would look more comprehensive just because I have no idea how to insert TEX or another mean of writing equations here :)

>> Let's consider the PHI as operation which has extra operands - the join of the usual PHIs operands and the set of the all branches on which this PHI is control dependent.
>>  Now we can process the PHI in usual solving loop as any other instruction computing it's divergence as the minimum over all operands.
>> 
>> Usual op:   D = MIN (Opnd0, Opnd1, .... OpndN)
>>  PHI:            D = MIN(Opnd0, Opnd1, .... OpndN,  ControlDepBranch0, ControlDepBranch1 ......   ControlDepBranchN)
> 
> Same idea here. However, our approach is two staged:
>  If a basic block is in the set `DA::divergentJoinBlocks` it means that it has the `divergent` lattice element.
>  In `DA::updatePHInode`, we join in the lattice element of the parent block of the phi node (`DA::isJoinDivergent`).
>  Why two stages? If the branch becomes divergent, the DA receives the set of all join points from the SyncDependenceAnalysis, marks all those blocks as join divergent and queues the (yet non-divergent) phi nodes in those blocks.
>  When the phi node are updated later on they take in their parent's block join divergence as an additional operand to their update function.
> 
>> 

Okay. Let's say on iteration N we compute the branch divergence as 1 (divergent). We mark all the joins as divergent.
The question is which iteration the PHIs users are updated? For the PHIs that are dominated by the branch, I guess, users will be updated in the same iteration because by the moment PHIs are processed they already have the "divergent" flag.
If the branch is the back edge source we have to iterate once more to propagate the sync divergence to the loop header and body. Is this correct?

Repository:
  rL LLVM

https://reviews.llvm.org/D50433