[PATCH] D35851: [Dominators] Include infinite loops in PostDominatorTree

Mon Aug 7 10:15:41 PDT 2017

dberlin added a comment.

(Copying this comment because it ended up on the wrong review thread) 
Note: There is also a rewritten DSE that uses PostDom as well: https://reviews.llvm.org/D29624

I'm not sure i will get a chance to  produce LLVM IR for you (i'm sadly finding less and less time to work on LLVM these days, so i'm mostly having other people do it :P).

To give you a more described example  the following completely simple algorithm (which GCC and a few other compilers use variants of, see tree-ssa-dse.c) *should* work, but will give wrong answers if you ignore infinite loops or do weird things to them.
again, i've made it not catch every case so i can do it in <50 lines.  Normally, for example, there is more than one stack, instead of all possible stores on the same stack, which misses a ton of cases due to irrelevant aliasing, blah blah blah.

Stack StoreStack;  // In realer versions of this algorithm, we have multiple stacks or other structures. In fact, in this version, there will only ever be zero or one things on the stack.
void walkPostDomChildren(DomTreeNode *DTN)
{

  BasicBlock *BB = DTN->getBlock();
  if (!BB)
    return;

// Ensure the that the store at the top of the stack is from something above us in our tree.  Normally would be done by checking if  dfs numbers of the current block is contained within the dfs in/out numbers of the top of stack.   See newgvn.cpp, predicateinfo.cpp, etc for ValueStack, which does this.

Pop Store Stack until block for instruction at  Stack.top() postdominates DTN.

  for (auto *I : reverse(BB))
    if (auto *SI = dyn_cast<StoreInst>(I)) {
      // If whatever is on the top of the stack is a must alias and post-dominates us, we're dead.
      if (!Stack.empty() &&  AA.isMustAlias(Stack.top()->getPointerOperand(), I->getPointerOperand()) )
         I->eraseFromParent();
      else if (Stack.empty())
        Stack.push(SI)
      else
        // This is really a may-def, so we must clear the stack. In this stupid version, this is true even if it noaliases top of stack.
        Stack.clear()
       continue;
   }
      if (!(getModRefInfo(I) & MRI_NoModRef))
        // A use of memory, so reset our stack
        Stack.clear();
  for (auto * Child : DTN)
       walkPostDomChildren(DTN)

}

walkPostDomChildren(PDT->getRoot())

Basically, if a store postdominates another earlier store without an intervening may-use/may-def, the earlier store is dead.  There is nothing that could possibly see it happen before the later store.

Modulo transcription bugs, this algorithm is provably correct for our simplified assumptions. It will only eliminate stores that are dead.

Given the following program:

Block 1
store a, 5
if (foo)
{
Block 2:

  call foo(load a)
  goto Block 2

}
Block 3:
store a, 6

If infinite loops are not in the post-dominator tree, the tree will look like this:

3

|

1

The algorithm will visit 3, then push store a, 6 on the stack, visit 1, and use it to remove store a, 5
This is illegal.  The store a, 5, is visible and used in block 2, and block 2 in fact, calls a function with value we stored.

Whoops!

If infinite loops are connected to virtual exit, the tree will look like this:

virtual exit
 /  |  \
3  2 1

We will no longer eliminate store a, 5, and we will get a correct answer because at each point, nothing will be on the stack.
We will visit 3, push store a, 6. We will visit 2, pop the stack because store a, 6 is no longer in scope.  Stack will stay empty throughout 2.
We will visit 1, push store a, 5.  We will exit the algorithm with this store on the stack, having eliminated nothing.
We will get the correct answers no matter how many things no longer exit.  This is a safe and conservative post-dominator tree for our algorithm.

Is this perfect?  For certain, no.

In this example, for *this* algorithm, you can connect the infinite loop anywhere you like as long as you do not generate the post-dominator tree:
3

|

1

|

2

or
3

|

2

or
3

| \ |

2   1

If you generate that, you will eliminate the store a, 5 again, and you should not.

I believe the last one, sadly, is what your other patch will generate. It's not going to end up legal to simply cause the reverse-unreachable parts to have no effect on post-dom, because they are still forward-reachable.

As an example of what would theoretically work:

virtual exit

| \ |

3     \

| 1 |

2

Would also give a correct answer for this algorithm, on this example.  But would be not valid as a postdominator tree because there is no reverse path from 3 to 2, *and* there is a reverse path from 3 to 1, but 3 is not an ancestor of 1.

Sadly, there is no good-in-all-cases to be had

In fact, if the program was:
Block 1
store a, 5
if (foo)
{
Block 2:

  goto Block 2

}
Block 3:
store a, 6

we will still generate:
virtual exit
 /  |  \
3  2 1

but in practice, we could eliminate store a, 5 as it's not possible to see that it didn't happen in this example.

The greater problem with not connecting the infinite loops to exit is that "where it is legal to connect the loops to" (IE what doesn't break this algorithm) is now complex.  Effectively, it must take into account where the memory instructions are (though the exact conditions that must be fulfilled do not have to be expressed like this)
Otherwise, i can generate an adversarial program that will give a wrong answer when you apply the above algorithm.
IE by placing loads and stores and calls such that you end up with the same problem as the 3->1->2 post dom tree above.

In particular, for the algorithm above, the places for where you connect the infinite loops must ensure either the stack gets cleared in the right place, or that the blocks are siblings instead of parent child, so the scope gets reset.  (IE either we must always see the may-use in the infinite loop and clear the stack before going above it, or we must pop the stack so that we do not try to eliminate something "above" the infinite loop using something "below" the infinite loop)

Worse than that, that's just what works with *this* algorithm.

SSUPRE, which requires a conservative correct postdominance frontier, has worse requirements you'd have to prove are met. 
So does Sreedhar and Gao's IDF algorithm (which is what IDF calculator uses), which requires the CFG and PostDom tree be in sync, and postdom have the parent and sibling property.
(This is solvable by making the real cfg have the set of fake edges you are creating, but they'd need to be walked when you call predecessors.  If you connect them all to a fake exit, even if you have algorithms that must know about the edges, it's one block to special case instead of many)

Additionally, if you use post-dominance to prove speculation-safety, it's now not enough to know where the load/stores are, you'd have to know where every trapping instruction is to ensure the place you connect the infinite loop does not imply speculation safety where it should not.

Essentially: Even if you completely ignore the sibling and parent properties, connecting the infinite loops or changing the paths in general  requires enough knowledge of the algorithms being used to be able to prove that you aren't going to generate post-dom trees that make them wrong.
In practice, since at least the IDF calculator relies on the sibling and parent properties, this ends up a practical requirement for the connection points...

All that seems ... really fragile and difficult :)

https://reviews.llvm.org/D35851