[llvm] r296535 - Fix PR 24415 (at least), by making our post-dominator tree behavior sane.

Tue Mar 14 16:43:49 PDT 2017

 Hi,

just to add another point of view.

An intuitive definition of postdominator of B could be "a block that
will eventually be executed when B is executed".

A formal definition is "forall execution paths to the exit lays B".

These definitions do not match. First, on the path to B there might be
a loop whose exit condition is never true. Due to Turing's
Halteproblem, we actually can never now whether B will be executed
eventually. Second, if there is no path to the exit at all, the formal
definition effectively says that every block is a postdominator (
vioforall over empty set => true). That's obviously not useful in
practice. One can just say that postdominance is undefined in that
case, as the current PostDominatorTree does.

Michael Wolfe's book suggests to add "phantom edges" to connect the
infinite loop to the rest of the tree. I will use the same term. It's
an edge that is never executed. Indeed, one could have rewritten the
IR from

    loop:
      br label %loop

to

    loop:
      br i1 false, label %somewhere, label %loop

without changing semantics, the same way as PostDominatorTree interprets

    return1:
      ret void

    return2:
      ret void

as

    return1:
      br label %virtual_sink

    return2:
      br label %virtual_sink

    virtual_sink:
      ret void

I haven't seen arguments against that.

The path invariant "if B is a postdominator, then there is a path to
B" only derives from the intuitive definition. It's what programmers
would typically expect, and might be a goal to adhere in order to
avoid bugs. As already said, programmers cannot expect then B is
eventually executed.

However, even D29705 violates this. It adds a path to the virtual exit
even though there is no path to the exit. The only difference in
D30890 is the node it connects to. I don't think one is more "correct"
than the other. There is no change in semantics by phantom edges.
Again, there is no guarantee that an edge is ever taken. Passes that
are only correct without phantom edges would also be wrong if there
was a real edge.

Daniel already mentioned that PostDomTree currently does actually
connect dead-end blocks to the virtual exit node, done by this code in
GenericDomTree.

      for (auto *Node : nodes(&F))
        if (TraitsTy::child_begin(Node) == TraitsTy::child_end(Node))
          addRoot(Node);

This fails to detect infinite loops, so it is reasonable to see this
just as a bug in the current implementation.

An issue I can think of is that it is underdefined which block of an
infinite loop gets connected, but significantly determines the postdom
tree. We could just connect all of them. If we introduce another
virtual sink node for non-exiting exists, all blocks of the loop
become direct children of that node, such that we "flattened" all the
structure in it. Let's call that virtual sink node "nullptr" and we
something similar as we have now.

Eli Friedman already mentioned the consistency between different
analyses in D30890, i.e. different analyses have different views of
the same CFG. I think he's right that consistency is harder to
maintain if phantom edges go into the middle of a CFG instead to the
PostDomTree-only virtual node. The first even creates virtual loops,
the latter doesn't.

Let's go into detail how PostDom is used in LLVM passes.

RegionInfo

Tobias has a concrete vision of what an SESE region is. Phantom edges
to the exit destroy the single exit property and hence don't form such
a region anymore. A workaround should be possible if there is a
distinct virtual sink for infinite loops. RegionInfo only needs to
take into account that that sink can be member of multiple non-nested
regions.

I personally think that LLVM's RegionInfo is something particular in
LLVM. I'd prefer something that better interacts with LoopInfo.
Currently Regions and Loops can intersect in any possible way. The
Dragonbook defines regions through loops and their bodies. This means
SEME regions.

Aggressive Dead Code Elimination

ADCE uses IDFCalculator to determine control dependencies. It
implicitly assumes that only blocks reachable from the PostDomTree's
virtual root are live. That's obviously not true, so it just marks all
block not in the PostDomTree as live (Strangely, even twice). D29705
and D30890 both fix that by adding the blocks to the tree.
For control dependency. I would even argue that D30890 can optimize
more. Given this CFG:

define i32 @InfiteLoop(i32 %A, i32 %B, double* %C) {
entry:
  br label %maybeloop

maybeloop:
  br.  i1 undef, label %loop, label %exit

loop:
   br label %loop

exit:
  br label %return

return:
  ret i32 42
}

Is %maybeloop a control dependency of %exit and therefore should be
preserved? It may depend on -adce-remove-loops. If
-adce-remove-loops=true, then loop itself is added to the live set,
then also %maybeloop, even in D30890. If -adce-remove-loops=true, then
the assumption inherited from C that loops without side-effects
eventually exit, i.e. such loops have undefined behavior and we are
free to remove the loop.

I tried with D29705 and D30890, both change the function above to:

define i32 @InfiteLoop(i32 %A, i32 %B, double* %C) {
entry:
  br label %maybeloop

maybeloop:                                        ; preds = %entry
  br label %exit

loop:                                             ; preds = %loop
  br label %loop

exit:                                             ; preds = %maybeloop
  br label %return

return:                                           ; preds = %exit
  ret i32 42
}

When I get an edge from %loop, I get this:

define i32 @PhantomEdge(i32 %A, i32 %B, double* %C) {
entry:
  br label %maybeloop

maybeloop:                                        ; preds = %entry
  br label %exit

loop:                                             ; No predecessors!
  br label %exit

exit:                                             ; preds = %maybeloop, %loop
  br label %return

return:                                           ; preds = %exit
  ret i32 42
}

That is, removing %loop should be viable.

Disclaimer: I didn't study ADCE in all detail and am no language
lawyer. Please correct me if I am wrong.

DivergenceAnalysis

Looks currently to be wrong as it doesn't consider values defined in
blocks leading to infinite loops. I think both patches fix it as it
only considers PHIs where divergent paths merge again, which never
happens for infinite loops. Still, D30890

Sanitizer Coverage

Used to optimize instrumentation away if there is "full dominance".
D29705 changes that because it makes seem like there are alternative
paths to the exit. However, all that could happen is additional
redundant instrumentation.

GuardWidening

Uses PostDomTree for a profitability score. It's only score, so there
should be no influence on correctness.

SSUPRE

Single Status Use-Form Partial Redundancy Elimination? This is not
implemented in LLVM, and I cannot say how it could be influenced.

In summary, I tried not to portrait one variant to be better than the
other. But I am convinced that there is not only one sane way to do
it, both ways seem viable.

Michael