[PATCH] D35851: [Dominators] Include infinite loops in PostDominatorTree

Tue Aug 8 17:21:22 PDT 2017

grosser added a comment.

In https://reviews.llvm.org/D35851#834070, @dberlin wrote:

> (Copying this comment because it ended up on the wrong review thread) 
>  Note: There is also a rewritten DSE that uses PostDom as well: https://reviews.llvm.org/D29624

Hi Daniel,

sorry for the delay. Let me get back into the discussion.

> I'm not sure i will get a chance to  produce LLVM IR for you (i'm sadly finding less and less time to work on LLVM these days, so i'm mostly having other people do it :P).

Especially because of this, thank you very much for taking the time to discuss this in so much detail. This is indeed very helpful.

> To give you a more described example  the following completely simple algorithm (which GCC and a few other compilers use variants of, see tree-ssa-dse.c) *should* work, but will give wrong answers if you ignore infinite loops or do weird things to them.
>  again, i've made it not catch every case so i can do it in <50 lines.  Normally, for example, there is more than one stack, instead of all possible stores on the same stack, which misses a ton of cases due to irrelevant aliasing, blah blah blah.
> 
>   Stack StoreStack;  // In realer versions of this algorithm, we have multiple stacks or other structures. In fact, in this version, there will only ever be zero or one things on the stack.
>   void walkPostDomChildren(DomTreeNode *DTN)
>   {
>     BasicBlock *BB = DTN->getBlock();
>     if (!BB)
>       return;
>    // Ensure the that the store at the top of the stack is from something above us in our tree.  Normally would be done by checking if  dfs numbers of the current block is contained within the dfs in/out numbers of the top of stack.   See newgvn.cpp, predicateinfo.cpp, etc for ValueStack, which does this.
>   
>   
>   Pop Store Stack until block for instruction at  Stack.top() postdominates DTN.
>    
>     for (auto *I : reverse(BB))
>       if (auto *SI = dyn_cast<StoreInst>(I)) {
>         // If whatever is on the top of the stack is a must alias and post-dominates us, we're dead.
>         if (!Stack.empty() &&  AA.isMustAlias(Stack.top()->getPointerOperand(), I->getPointerOperand()) )
>            I->eraseFromParent();
>         else if (Stack.empty())
>           Stack.push(SI)
>         else
>           // This is really a may-def, so we must clear the stack. In this stupid version, this is true even if it noaliases top of stack.
>           Stack.clear()
>          continue;
>      }
>         if (!(getModRefInfo(I) & MRI_NoModRef))
>           // A use of memory, so reset our stack
>           Stack.clear();
>     for (auto * Child : DTN)
>          walkPostDomChildren(DTN)
>   }
>   
>   walkPostDomChildren(PDT->getRoot())
> 
> 
> Basically, if a store postdominates another earlier store without an intervening may-use/may-def, the earlier store is dead.  There is nothing that could possibly see it happen before the later store.
> 
> Modulo transcription bugs, this algorithm is provably correct for our simplified assumptions. It will only eliminate stores that are dead.
> 
> Given the following program:
> 
>   Block 1
>   store a, 5
>   if (foo)
>   {
>   Block 2:
>        call foo(load a)
>        goto Block 2
>   }
>   Block 3:
>   store a, 6
>   

This is a great example, which perfectly highlights the part I do not yet feel is clear to me.

AFAIU your program corresponds to the following piece of LLVM-IR (which happens to almost correspond to the visualizations I posted before):

  define void @foo(float* %a) {
     ....
  }

  define void @foo(i1 %cond, float* %a) {
  block1:
     store float 5.0, float* %a
     br i1 %cond, label %block2, label %block3

  block2:
    call foo(float* %a)
    br label %block2

  block3:
    store float 6.0, float* %a
  }

which is very similar in nature to the one I posted above:

F3889049: Post Dominance (1).png <https://reviews.llvm.org/F3889049>

> If infinite loops are not in the post-dominator tree, the tree will look like this:
> 
>   3
>   |
>   1
> 
> 
> The algorithm will visit 3, then push store a, 6 on the stack, visit 1, and use it to remove store a, 5
>  This is illegal.  The store a, 5, is visible and used in block 2, and block 2 in fact, calls a function with value we stored.
> 
> Whoops!
> 
> If infinite loops are connected to virtual exit, the tree will look like this:
> 
>   virtual exit
>    /  |  \
>   3  2   1
> 
> 
> We will no longer eliminate store a, 5, and we will get a correct answer because at each point, nothing will be on the stack.
>  We will visit 3, push store a, 6. We will visit 2, pop the stack because store a, 6 is no longer in scope.  Stack will stay empty throughout 2.
>  We will visit 1, push store a, 5.  We will exit the algorithm with this store on the stack, having eliminated nothing.
>  We will get the correct answers no matter how many things no longer exit.  This is a safe and conservative post-dominator tree for our algorithm.
> 
> Is this perfect?  For certain, no.
> 
> In this example, for *this* algorithm, you can connect the infinite loop anywhere you like as long as you do not generate the post-dominator tree:
> 
>   3
>   |
>   1
>   |
>   2
>   
>   or
>   3
>   |
>   2
>   
>   or
>   3
>   |  \
>   2   1
>   
> 
> 
> If you generate that, you will eliminate the store a, 5 again, and you should not.

Interesting, but slightly confusing. Assume you change the previous program slightly:

  define void @foo(float* %a) {
     ....
  }

  define void @foo(i1 %cond, float* %a) {
  block1:
     store float 5.0, float* %a
     br i1 %cond, label %block2, label %block3

  block2:
    call foo(float* %a)
    br i1 true, label %block2, label %block1   <---- Only this line has changed

  block3:
    store float 6.0, float* %a
  }

You get a CFG very similar to the one I posted earlier:

F3887544: Post Dominance.png <https://reviews.llvm.org/F3887544>

In your case the PDT is the first of the three you just mentioned:

> 3
> 
>  |
> 
> 
> 1
> 
>  |
> 
> 
> 2

This is the post-dominator tree which we compute today and which won't be changed by this proposal (as no reverse-unreachable blocks exist). Following your description the above post-dominator tree has the property that block3 post-dominates block1. What prevents the algorithm above to delete the store in block1 in the second slightly modified case? AFAIU the PDTs of both examples are identical and your algorithm does not look at the CFG edges. Hence I would expect it to incorrectly delete the original store also in the slightly modified example. Do I miss something obvious?

I will look into this again tomorrow to understand better where I am wrong.

One other slightly related question. Do you assume that A post-dominates B means that a program that reaches B eventually must reach A?

https://reviews.llvm.org/D35851