[PATCH] D24805: [GVNSink] Initial GVNSink prototype

Tue May 23 16:07:49 PDT 2017

On Tue, May 23, 2017 at 7:10 AM, James Molloy <James.Molloy at arm.com> wrote:

> Hi,
>
> Firstly, overall I think perfection shouldn't stand in the way of progress
> and that GVNSink is a good improvement on the really bad bisimulation
> algorithm in SimplifyCFG. Do you have any objections to it going in-tree
> and I can work on improving/rewriting the algorithm there, after I remove
> the old SimplifyCFG algorithm?
>
> Sure, i have no objection, i'm just trying to understand the set of
problems you face. I'm happy to accept it and continue this discussion.

I'm afraid I don't fully follow you. What compounds this is that you've
> referred to this (inverted or post value numbering) being in the literature
> somewhere, but I've totally failed to find it. Do you have any starting
> pointers?
>

No, it does not exist in literature that i'm aware of, at least under that
name :)
There are "predictive commoning" algorithms, but most are loop based.

>
> Consider the following example, which is one of the simple motivating
> testcases for the creation of GVNSink (it doesn't show all the features of
> GVNSink by a long way, but it does illustrate a problem):
>
> bb1:
>   x1 = load %p
>   x2 = add x1, %g
>   x3 = add x2, 5
>   x4 = add x3, %g2
>   store x4, %p
>   goto bb3
>
> bb2:
>   y1 = load %p
>   y2 = add y1, %g
>   y3 = add y2, 42
>   y4 = add y3, %g2
>   store y4, %p
>   goto bb3
>
> In this example all instructions can be sunk to bb3 with just one phi(5,
> 42). I'm trying to understand how your suggestion would work.
>

Sure.

> You said:
>
> "(in particular, you will create a phi for any value with two different
> leaders at a join point you must pass to sink it. If you calculate this
> properly, and in topo order of expressions, such that you process operands
> before things that use them, you can calculate the number of phis you would
> insert in O(N) time overall)"
>
> My difficulty is that I can't think of a value numbering function that
> wouldn't diverge at {x3, y3}. Consider a "normal" value numbering in which
> VN[x] == VN[y] iff congruent(x, y):
>
>
Let me try to phrase it another way that may be easier to understand.
Let's start with regular GVN because it's a bit easier to understand, and
then i'll transform it into your problem.

In either full redundancy elimination or partial redundancy elimination,
the possible insertion points of the expression (represented by phis) is
directly related to the availability of operands of a given expression.
You are only eliminating things that have the same value, and thus, you
only need to be able to create that value at a given point.

Whether you need a phi node to eliminate an expression to do so is dictated
by the following:

If the operands are available in a dominator, you do not need a phi.
If they are only available in predecessors, you need a phi.

What does the second mean?
Well, in a value based world it means you are merging two non-dominating
operands into a single one with the same value.
The only case that can occur where it is available in all predecessors and
not dominate you is for there to be different leaders for the same value
number in some set of predecessors.

Proof:
The definition of dominance means that all paths from a entry to a given
node go through a certain block.  For the same operand to be available in
all predecessors and not dominate you, it implies there is a path around
the dominator to your block through one of these predecessors, so that it
doessn't dominate you.  Which implies the dominator doesn't dominate you,
since that path can be used as a path from entry to your block without
going through the dominator :)

What does all of this imply?
It means for FRE/PRE, the set of phi nodes you could ever insert for an
expression, no matter where you move it, is the iterated dominance frontier
of all of the definition blocks of the leaders for each operand
(recursively computed), plus the IDF of the blocks the expression currently
occurs).  The first is because that is where the expression may change, and
thus, may need a phi of the old and new leaders for it. The second is
because those are the places where you may eliminate redundancies due to
being able to merge existing copies.

IE given VN1+VN2, the set of phis you could ever insert is:

IDF(definition blocks of leaders of VN1 U definition blocks of leaders of
VN2 U blocks that have an occurrence of VN1 + VN2)
It actually is completely and totally irrelevant *what* notion of
congruence you use for VN1 or VN2. The only relevant part is the location
of the things you are calling leaders.

This is the complete set of phis you could ever insert to move a "value
expression" somewhere valid.
It is a superset of the valid insertion points. (I'm going to use insertion
point to mean ' a place you could recreate the expression', whether that is
by merging two existing copies, or by actually placing a new instruction)

(SSAPRE and others prune the set a bit when dealing with lexical identity.
Such pruning is not optimal or valid in some cases when dealing with value
numbers)

Not every insertion point is valid, and the rest of the computation is
determining which are valid, and which eliminate redundancies safely (where
"safe" here is defined as not inserting new computations along paths they
don't already get computed on).

This is also true of your value numbering - not all points are really
safe.  That does not change the superset, it only changes which ones end up
being valid points in the end :)

Now let's see how this applies to your problem:

> bb1:
>   x1 = load %p     [VN=1]
>   x2 = add x1, %g  [VN=2]
>   x3 = add x2, 5   [VN=3]
>   x4 = add x3, %g2 [VN=4]
>   store x4, %p     [VN=5]
>   goto bb3
>
> bb2:
>   y1 = load %p     [VN=1]
>   y2 = add y1, %g  [VN=2]
>   y3 = add y2, 42  [VN=6]  <-- 42 != 5, so VN[y3] != VN[x3]
>   y4 = add y3, %g2 [VN=7]
>   store y4, %p     [VN=8]  <-- value numbers have diverged
>   goto bb3
>
> Now consider an inverse value numbering in which VN[x] == VN[y] => x or
> y's uses can be replaced with phi(x, y):
>
> bb1:
>   x1 = load %p     [VN=7]  <-- value numbers have diverged
>   x2 = add x1, %g  [VN=6]  <-- x2 cannot be used the same as y2; different
> constants get added
>   x3 = add x2, 5   [VN=3]  <-- x3 is used the same as y3
>   x4 = add x3, %g2 [VN=2]
>   store x4, %p     [VN=1]
>   goto bb3
>
> bb2:
>   y1 = load %p     [VN=5]
>   y2 = add y1, %g  [VN=4]
>   y3 = add y2, 42  [VN=3]
>   y4 = add y3, %g2 [VN=2]
>   store y4, %p     [VN=1]
>   goto bb3
>
> Both value numberings notice the similarity of only half the sinkable
> instructions.
>
> The way I get around this in GVNSink is to define VN[x] as an *optimistic*
> function that only takes into account the opcodes and number of uses. This
> allows it to number x2 and y2 the same, even though they could not actually
> replace each other. The value numbering is used purely as a tool to narrow
> the search space - bisimulation (or n-simulation!) takes over after that to
> ensure validity.
>
> The way I'm thinking about this problem, there is surely no such value
> numbering function that will indicate when a PHI is needed and will not
> diverge at that point. Therefore, I must be thinking about the problem
> differently (wrongly) to you.
>

The set of phis you may ever need to sink such expressions still has not
changed from above. Only your definition of congruence.
The possible insertion point for your expression is bb3.
Recursively, it turns out to recreate VN3, you need a phi to merge 42, 5
because there are different leaders for the operands of VN3 in your
predecessors (for 42 and 5)[1]. You need a phi to merge x2, y2 for the same
reason.
Once that is done, you will have
op1 = phi(x2, y2)
op2 = phi(42, 5)
newval = op1 + op2 (VN3)

Whether you now need a phi for VN3, depends on the locations of the other
occurrences of VN3 other than newval.
As i said, it is easier to process expressions in topo order to be able to
compute the possible phis  ahead of time.
Because once you've done the above, and go to process VN2, the answer to
whether you *need* phis requires knowing that newval will exist.
It's infinitely easier to just require that newval be inserted already at
that point, and a leader of the right value number, if you can.

In your world, things are easier because you do not perform new insertions
(that i remember!)
Only merges.
Thus,  you would process them in reverse topo order, such that you would
sink store phi(x4, y4)
then sink add phi(y3, x3), g2, etc.

You can still use the computation i gave. it just may require more
iterations.
There is a more exact computation you could use, but it's complicated.

So, yes, i understand that  the problem you are solving is this:

Given a hypothetical expression X op Y placed in block BB, are there things
i can fill in for X and Y that would make this expression valid an enable
me to eliminate existing copies.
The answer to where this hypothetical expression could ever be placed, and
eliminate existing copies, is still the same as i gave above, even if you
change the definition of congruence.

Those are the possible (but maybe not valid or useful)  insertion/move
points, and the set of places a given *expression* may need a phi node.
Having an over-inclusive definition of congruence just means the space of
phis you search is larger than you want.

(i could also show you how you could actually do this as a true inverse PRE
problem, and sparsely solve it, but i wouldn't bother if what you hvae
works for you :P)

>
> Could you please enlighten me or maybe point me at papers?
>
> Cheers, and apologies for the slowness in picking this up.
>
> James
>
> On 22 May 2017, at 17:46, Daniel Berlin <dberlin at dberlin.org> wrote:
>
>
>
> On Mon, May 22, 2017 at 8:41 AM, James Molloy <James.Molloy at arm.com>
> wrote:
>
>> Hi,
>>
>> Thanks for the quick reply; this is a very helpful and constructive
>> discussion :)
>>
>
> Happy to help :)
>
>
>>
>> The oversimplification and repetition below is for my own understanding
>> and so that I can better understand your replies, as you know this subject
>> far better than I.
>>
>> A: store 1
>> B: store 2
>> C: store 3
>> D:
>>
>> I understand that in a classical sinking algorithm we'd want to walk
>> backwards:
>>   1) Sink 'store 3' from C to D
>>   2) Sink 'store 2' from B to D
>>   3) Sink 'store 1' from A to D
>>
>> A key to this working is that each instruction, when considered, was sunk
>> as far as it could be. If we had added the restriction that an instruction
>> could only be sunk to its immediate successor block, that algorithm would
>> have taken three iterations to reach a fixpoint.
>>
> Yes
>
>
>>
>> GVNSink can (at the moment at least) only sink instructions from a block
>> into its immediate successor. This is what allows it to model accurately
>> enough how many PHIs it is going to create and whether sinking is actually
>> worthwhile (and how much sinking)
>
>
> How much is the former, and how much is the later?
>
> The number of phis you can create is calculable ahead of time in O(N) time
> for all operands, and related the availability of values
> (in particular, you will create a phi for any value with two different
> leaders at a join point you must pass to sink it. If you calculate this
> properly, and in topo order of expressions, such that you process operands
> before things that use them, you can calculate the number of phis you would
> insert in O(N) time overall)
>
>
> . I don't believe it would be able to handle all the cases it currently
>> does if it didn't have this restriction (but would be able to handle other
>> cases that it currently can't, so I see that very much as an enhancement
>> I'd like to add later).
>>
>
> I actually don't buy this, fwiw.
> It's entirely possible to calculate all the hypothetical insertion points
> ahead of time, and figure out which make sense from a cost perspective.
> There are sinking (and PRE) algorithms that do this.
> You could do exactly the same cases you do now.
>
>
> Now, whether you want that complexity is a different question :)
>
>
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170523/1de5ba7a/attachment.html>