[llvm] r296535 - Fix PR 24415 (at least), by making our post-dominator tree behavior sane.

Mon Mar 13 07:51:43 PDT 2017

Hi Daniel,

thank you again for your patience!

In summary, I agree that we want to have a complete post-dominance tree,
but I am still concerned about the "flattened" dominance relation. I
spent some time to understand if we can get a full post-dominance tree
without loosing precision and feel as if we can. Now, I am happy to
learn that I am wrong. In fact, at the very least, this discussion
collected a lot of background material which will make great source-code
comments.

Let me start with a concrete question:

# Does connecting the virtual edge to the last connecting
reverse-reachable basic block break any test case?

In https://reviews.llvm.org/D30890, I slightly modified your patch
(might need some interface cleanups and another round of testing) to
connect infinite loops to the BB that attaches them to the CFG. The
modified patch passes alle existing tests only requiring the obvious
adjustments, works the way I expect on RegionInfo, and similarly to your
patch removes the ADCE hack. So at least within the test coverage we
have this solution does not uncover any problems. It also fixes two
bound-check elimination problems we had since a while in Julia and
most-importantly preserves the post-dominance relationship at the
connection points of reverse-unreachable code. Is there a specific
example LLVM-IR transformation where we lack test coverage, but which
would break today?

# Theoretical Details

LLVM currently does not include reverse-unreachable code in the
post-dominator tree. I understand that there is a desire to ensure all
basic blocks are part of the post-dominator tree and I agree that we
want to make this happen!

You aim for a single post-dominator tree (some kind of forest could be
an alternative). A sufficient solution to ensure a single post-dominator
tree is the augmentation of the CFG with virtual edges such that each
basic block is reverse reachable from a (virtual) exit node. I again
agree that we want to have a single post-dom tree and that this goal is
best reached by
augmenting the CFG. It is not yet clear to me why inserting connections
directly to the exiting basic block is the only valid choice. To
establish reverse reachability, it is AFAIU sufficient to insert edges
to any other node that is already reverse reachable. The (virtual) exit
node is (trivially) reverse reachable, this is true. However, AFAIU
inserting an edge to any other reverse reachable node would be
sufficient to ensure a complete tree. I assume you agree here.

Now you state that the only tree that makes sense to form is the one
where reverse-unreachable blocks are connected to the exit block,
because such a tree:

1) is more canonical

   o "I strongly believe we would define it the same way everyone else
   does, which is to connect the backedge of the infinite loop 
    to a virtual exit node. That is the canonical answer you will find
    everywhere, and there are good reasons for this."
   o "what we do now, post-patch is the canonical thing to do, taught in
   compiler textbooks, and used by every other compiler."

2) is the only "correct" solution 

   o "Did you consider connecting it somewhere else?", "Yes, because it
   is IMHO not valid from a correctness perspective, and this will cause
   bugs."

1) To understand what is canonical I went a little bit through the
literature (and the source of other compilers):

https://docs.google.com/document/d/1lL8xcqfnqNArvQj3TpgEXOTybr8kucagjzqvChQCUGI/edit

You are indeed right that several compilers (including gcc), but not
all, connect infinite loops to a canonical exit node. On the other side,
literature-wise it was hard to find this mentioned. Even though
post-dominators are known since Lowry 1969, the definitions I can find
always expect a CFG where all bbs are reverse-reachable. This also holds
the the arxiv paper you cited,
which states: "Let G = (V, E, s) be a flow graph, and let T be a tree
rooted at s, with vertex set consisting of the vertices that are
reachable from s.". The only compiler textbook I could come up with that
discusses non reverse reachable infinite loops is  "High performance
compilers for parallel computing, Michael Wolfe" who describes precisely
our problem. He suggests as solutions either to create independent trees
or to introduce a phantom edge going e.g. to the exit node.

2) Regarding correctness: The original definition of post-dominance
(Allen 1970) states "A node b_1 is said to forward dominate or post
dominate a node b k if b i is on every path from b,. to all exit nodes."

Hence, for this very simple example:

            start
               |
            first
             /   \    
/\        /        bb
^  \    /           \
^    loop        exit
 \__/

all paths from first to exit go through bb. Consequently bb
post-dominates exit. In the original papers non-reverse unreachable
nodes have not been considered, so they do not give us any insights how
such CFG parts should be handled.

Hence, one needs to define an extension that at best preserves this
original property. As discussed earlier we can add additional edges that
(transitively) connect loop to a virtual exit. Your proposal is to
connect loop -> exit:

            start
               |
            first
             /   \    
/\        /      bb
^  \    /           \
^    loop  ->   exit
 \__/

Which means that we loose the original dominance property "bb
post-dominates first". I would not claim this is incorrect, but at least
imprecise.

                  start
                     |
    --------> first
  /                  /   \    
^     /\        /      bb
^    ^  \    /           \
^    ^    loop      exit
\     \__/|
  \         /
     -----

Alternatively, we can connect loop to first, which preserves the
property that bb post-dominates first.

You explain that the latter solution is incorrect because "In order for
x to be a parent of y in the Dom/pdom tree, there must be a path (in the
walk direction) from x to y". This statement requires all blocks to be
reachable from the source. Without any augmenting edges, we cannot give
any statements about dominance relations covering "loop". When
artificial edges have been added, this property is true again.

Now, it seems that you have optimizations in mind which require such a
path to actually be executed. I believe this is beyond what
post-dominance can actually guarantee. Maybe what you are talking about
is strong post dominance?

“Node u is strongly post-dominated by v, [...] iff (1) u
[post-dominates] v and (2) there is some integer k ≥ 1 s.t. every u−path
of length larger than or equal to k passes through v. Node v is a proper
strong post-dominator of u iff [u strongly post-dominates v] and u [not
equa]l v.”
(“Parametric and Termination-Sensitive Control Dependence”, SAS 2006,
http://fsl.cs.illinois.edu/pubs/chen-rosu-2006-sas.pdf,
  "A Formal Model of Program Dependences and Its Implications for
  Software Testing, Debugging, and Maintenance, Andy Podgurski, Lori A.
  Clarke, 1990")

Most of the text above is just to give some background to others. I do
not expect you to answer in such detail. The only question I have if you
can give a specific example that would break with the alternative
solution I propose and which does not require strong post-dominance
(which is likely hard to guarantee)?

# What is a region in the context of infinite loops:

   o "SESE regions are not allowed to contain reverse-unreachable
   infinite loops, in any canonical definition i can find."

I was wondering which definition of an SESE region you looked up?

The Program Structure Tree Paper says:

      1. a dominates b,
     2. b postdominates a, and
     3. every cycle containing a also contains b and vice versa.

As the paper explicitly assumes that "every node occurs on a path from
start to end", infinite loops are not covered. An extension of the
region definition to infinite loops requires 1), 2) and 3) to be
extended to infinite loops. 1) and 3) are straightforward, 2) is what we
discuss. So without a conclusion on what post-dominance means in
presence of infinite loops, we cannot define SESE regions on such loops.

   o "Faster and More Focused Control-Flow Analysis for Business Process
   Models through SESE Decomposition"

          Let G = (N, E) be a workflow graph. A SESE fragment (fragment
          for short) F = (N′, E′)
          is a nonempty subgraph of G, i.e., N ′ ⊆ N and E′ = E ∩ (N ′ ×
          N ′) such that there exist
         edges e, e ′ ∈ E with E ∩ ((N \ N′) × N ′) = {e} and E ∩ (N′ ×
         (N \ N′)) = {e′}; e and e′ are
         called the entry and the exit edge of F, respectively.

The paper again assumes "each node n ∈ N is on a path from the start
node to the stop node", but the above definition works without dominance
purely on the graph theoretical constructs. It identifies subgraphs with
a single entering edge and a single exiting edge. This definition can be
directly applied to the infinite loop graphs we look at. Without any
extensions to the definition it identifies in the following CFG:

               start
                  |
               first
     /\        /     \
    ^  \    /         \
    ^    loop       exit
     \__/

the region with the single entering edge "start -> first" and the single
exiting edge "first -> exit" containing the blocks "first" and "loop". 

As far as I understand you disagree here. Can you point me to one of the
canonical definitions that you found, which define SESE regions
different to the definition above?

Now, maybe part of the different reasoning we have comes from the fact
that for you an infinite loop or an unreachable indeed exit the
function. I am not sure I understand why this is the case. The following
code has a well-define post-dominator tree:

    entry:
       br first

    first:
        br %p, loop, exit

    loop:
        br true, loop, first

and I believe we agree that "loop" is not exiting the function (at all).
Hence, I have difficulties to see why by just simplifying the branch
statement in loop, control flow is suddenly exiting the program in
"loop":

   entry:
       br first

    first:
       br %p, loop, exit

    loop
     br loop

It seems there is no reason to believe "loop" is exiting. Instead, we
just use the simple trick of adding a never taken edge to the CFG such
that all nodes become reverse unreachable. As this edge does not change
the dynamic program behavior, we can add it in any way we want. We only
need to do this in a way that 1) we do not loose precision, 2) the way
we add this edge is unsurprising and well documented.

Best,
Tobias

PS: Sorry for the long email. Don't feel obliged to reply lengthy. After
having put the background, I will try to be more concise.
PPS: You refer to some future patches that require post-dominance
invariants to be met. Which invariant do they rely on precisely? Are
there patches I can look into?