[PATCH] D50433: A New Divergence Analysis for LLVM
Alexander via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 24 09:28:47 PDT 2018
alex-t added a comment.
>> For given branch all the blocks where PHI nodes must be inserted belongs to the branch parent block dominance frontier.
>
> The problem with DF is that it implicitly assumes that there exists a definition at the function entry (see http://www.cs.utexas.edu/~pingali/CS380C/2010/papers/ssaCytron.pdf, comment below last equation of page 17).
> So, we would get imprecise results.
I'm not sure that I understand you correct...
I still do not get an idea what do you mean by "imprecise results". The assumption in the paper you have referred looks reasonable.
Let's say we have 2 disjoint paths from Entry to block X and you have a use of some value V in X.
Entry
/ \
A E
/ \ \
B C F [v0 = 1]
\ / |
D /
\ _ /
X v1 = PHI (1, F, undef, Entry)
Entry__
/ \
A [v1 = 2] E
/ \ |
B C F [v0 = 1]
\ / __/
D /
\ _ /
X v2 = PHI (1, F, 2, A)
Irrelevant of the definition in Entry, divergent branch in Entry makes the PHI in X divergent.
Each of this 2 paths should contain a definition for V. It does not matter in entry block or in one of it's successors.
You either have a normal PHI or a PHI with undef incoming value. How to handle undef depends of your decision.
You may conservatively consider any undef as divergent. This make your PHI divergent by data-dependency.
> Control-dependence/PDT does not give you correct results (see my earlier example in https://reviews.llvm.org/D50433#1205934).
Oh.. :) Please forget of the set differences that I mentioned. It was mentioned mistakenly.
You was concerned about using non-filtered PDFs since they could produce over-conservative results.
You probably meant not PDF but iterated PDF (PDF+)
PDF(X)+ = for each Y in PRED(X) : PDF(X) JOIN PDF(Y)
To illustrate further - the algorithm to compute this looks as follows:
( the code is just a sketch to illustrate)
for (auto & B : F)
{
const TerminatorInst * T = B->getTerminator();
if (T->getNumSuccessors() > 1)
{
for ( auto & I : succs(B))
{
DomTreeNode * runner = PDT->getPostDomTree().getNode(
const_cast<BasicBlock*>(*I));
DomTreeNode * sentinel = PDT->getPostDomTree().getNode(
const_cast<BasicBlock*>(&(*B)))->getIDom();
while (runner && runner != sentinel)
{
PDF[runner->getBlock()].insert(&*B);
runner = runner->getIDom();
}
}
}
}
I meant just PDF - without closure over all predecessors.
Entry
/ \
A_____ E
/ \ \
B[v1 = 2] C F [v0 = 1]
\ _____/ _/
D /
\ _ /
X v2 = PHI (1, F, 2, A)
PDF(B) = {A} DF+(B) = {A, Entry}
PDF(F) = DF+(F) = {Entry}
For PHI in X we have 2 source blocks - B and F so we only have to examine branches in A and Entry
If the second definition of V was in C instead of F we'd only look at the branch in A.
For your example with switch:
+----A----+
| | |
|
C0 C1 D
\ | [..]
\ |
J
PDF(C0) = {A}
PDF(C1) = {A}
Let's say in J we have v2 = PHI(v0, A. v1, C0) we should examine A terminator because PDF(C0) = {A}, PDF(A) = {}
> Now, we could use the DF of DT as it's often done in SSA construction.
> When propagating a definition at a block B in the SDA we could skip right to the fringes of the DF of block B; there won't be any joins within the DF of B.
> That does not fundamentally alter the algorithm - we would iterate over the DF blocks of B instead of its immediate successors.. that's it.
> In a way, we do this already when a definition reaches the loop header from outside the loop: in that case we skip right to the loop exits, knowing that the loop is reducible and so the loop header dominates at least all blocks inside the loop.
>
>> I do not consider this as a serious issue. I just noticed that there is a lot of code that computes the information that has already been computed once before.
>
> Using DF could speed up the join point computation at the cost of pre-computing the DF for the region. It really comes down to the numbers. Are there any other users of DF in your pipeline that would offset its cost?
> If not i would suggest planning lazy DF computation for a future revision of the DivergenceAnalysis in case SDA performance should ever become an issue.
Repository:
rL LLVM
https://reviews.llvm.org/D50433
More information about the llvm-commits
mailing list