[llvm-dev] Store unswitch

Fri Apr 28 07:51:04 PDT 2017

On Fri, Apr 28, 2017 at 7:42 AM, James Molloy <james at jamesmolloy.co.uk>
wrote:

> Hi Danny,
>
> Thanks for that :) However I've just updated the prototype patch to NewGVN
> and it didn't need any API changes - all I rely on is GVNExpression.
>
>
Oh, nice.
Note, Bryant has been working on PDSE, which, combined with your code here,
would  probably give us value based partial dead store elimination :)
(since the value numbering issues are related)

Hongbin,
>
> I wanted to explain a little about what GVNSink can currently do, what it
> was designed for and hopefully how to make it handle your testcase.
>
> *Background*
>
> Common code sinking is more difficult to efficently do than one might
> expect. Prior to October 2016, LLVM was able to handle simple cases of this
> within SimplifyCFG, for example:
>
>     if (a) {
>       b += 42;
>     } else {
>       b += 42;
>     }
>     // -> b += 42
>
> It could even sink operations where a phi was required:
>
>     if (a) {
>       b += 42;
>     } else {
>       b += 64;
>     }
>     // -> b += phi(42, 64)
>
> However it had a simplistic execution model and so:
>   1) Restricted itself to blocks with only TWO incoming arcs
>   2) Would only ever produce one PHI and then bail out
>
> Generally, the factors that make the search space for a good generic
> sinking optimization large are:
>
>   1) The number of incoming edges to a block.
>   2) Non-monotonicity of heuristic - for example it might not be
> worthwhile sinking 2 instructions but may be worthwhile sinking 3
> instructions.
>   3) Permutations of otherwise similar instructions in incoming blocks.
>
> I'll describe them all in a bit more detail:
>
> *1) The number of incoming edges to a block*
>
> There are many constructs that can create a block with more than two input
> edges. An obvious example is a switch:
>
>     switch (a) {
>       case 1: b += 32; break;
>       case 2: b += 43; break;
>       default: break;
>     }
>
> Another is a chain of if-elses:
>
>     if (a) {
>       b += 32;
>     } else if (c) {
>       b += 43;
>     } else {
>       b += 65;
>     }
>
> Note that those two examples are subtly different. In the latter example,
> every branch performs an addition to 'b'. We can therefore sink all the
> adds to the end and perform only one add:
>
>     b += phi(32, 43, 65);
>
> However in the former case things aren't so simple. The default case never
> does any add, so we have to speculate the add instruction in that case. For
> adds, this is simple - adds don't have sideeffects and we can speculate by
> adding with an identity (0):
>
>     b += phi(32, 43, 0);
>
> However what if 'b' was a global variable? In this case, each case of the
> switch will have a load/add/store sequence:
>
>                switch(a) ---------------
>               /        \                \
>               |         |                |
>     [t1 = load @b]    [t2 = load @b]     |
>     [t1 = add t1, 32] [t2 = add t2, 43]  |
>     [store @b, t1]    [store @b, t2]     |
>                \        /               /
>                 v       v v-------------
>              [             ]
>
> In most cases, we cannot be sure that in the default case @b is allowed to
> be written to. Speculating a store in this case could be invalid.
> Therefore, if we want to sink the store (and thus the add and load) we need
> to invent a new block to sink them to.
>
>                switch(a) ---------------
>               /        \                \
>               |         |                |
>     [t1 = load @b]    [t2 = load @b]     |
>     [t1 = add t1, 32] [t2 = add t2, 43]  |
>                \        /                |
>              [ p = phi(t1, t2) ]        /
>              [ store @b, p ]           /
>                    |      +------------
>                    v      v
>              [             ]
>
> Creating this new block allows the switch to be turned into a trivial one
> that can be lowered to a bit test or jump table.
>
> There is an even more complex case that appears quite a bit:
>
>     if (a) {
>       switch (b) {
>       case 1: c += 42; break;
>       case 2: c += 2; break;
>       default: c += 3; break;
>       }
>     } else {
>       switch (d) {
>       case 1: e -= 2; break;
>       case 2: e -= 6; break;
>       default: e -= 32; break;
>       }
>     }
>
> We canonicalize this CFG to one cleanup block with six incoming arcs. In
> this case, it takes some nontrivial analysis to determine what to do, as it
> requires partitioning the incoming arcs into sets, performing different
> sinking operations on each (in this case two disjoint sets, but there are
> cases where the sets are not disjoint and some cost heuristic needs to be
> used).
>
> *2) Non-monotonicity of heuristic*
>
> What I'm trying to say here is the greedy algorithms work best when the
> heuristic they are computing is monotonic. Consider this:
>
>     if (a) {
>       tmp1 = *b;
>       tmp2 = *c;
>       tmp3 = tmp1 + tmp2;
>       *d = tmp3;
>     } else {
>       tmp4 = *b;
>       tmp5 = *c;
>       tmp6 = tmp4 + tmp5;
>       *e = tmp6;
>     }
>
> If we proceed sequentially from the bottom of these blocks, trying to
> determine if sinking them is worthwhile:
>
> Step 1: [*e = tmp6], [*d = tmp3]
>   - This is the same operation, a store, but it requires two PHIs - one
> for (e,d) and one for (tmp6, tmp3).
>   - This is NOT worthwhile. We are constraining register allocation for
> only one instruction commonned.
>
> Step 2: [tmp6 = tmp4 + tmp5], [tmp3 = tmp1 + tmp2]
>   - Again, the operation is the same, but now we need another two PHIs,
> plus one PHI from Step 1 (the phi required for (tmp6, tmp3) is now no
> longer required).
>   - Three PHIs required for two instructions sunk. NOT worth it.
>
> Step 3: [tmp5 = *c], [tmp2 = *c]
>   - These can be sunk, and have the same arguments so no PHIs required. We
> still need two of the PHIs from step 2 (the (tmp5, tmp2) phi now is no
> longer required)
>   - At this point we've sunk three instructions for two PHIs. Maybe it's
> worth it?
>
> Step 4: [tmp4 = *b], [tmp1 = *b]
>   - Same as step 3, but now the (tmp4, tmp1) PHI from step 2 goes away too
>   - So now we're sinking 4 instructions for one PHI.
>   - Now it's definitely worth it!
>
> Whatever heuristic one chooses, in a case like this it will get worse,
> then maybe get better again. So we can't simply bail out when the heuristic
> reaches some threshold, because maybe it will improve as we analyze more
> instructions!
>
> *3) Permutations of otherwise similar instructions*
>
> This is the case you describe. Stores (for example) that look similar but
> have been permuted:
>
>     if (a) {
>       *b = 52;
>       *c = 43;
>     } else {
>       *c = 87;
>       *b = 54;
>     }
>
> If we approach this bottom-up, we will compare [*b = 54], [*c = 43], then
> [*c = 87], [*b = 52] and conclude that two instructions can be sunk, but 4
> PHIs are required (because the target of the store and value are different
> in each case). If we looked at [*b = 54], [*b = 52], perhaps we would have
> seen that we can sink this with only *two* PHIs required and that might
> change our decision.
>
> This is generally quite difficult and especially shows up with memory
> instructions like this (if c and b alias, then permuting them when sinking
> would be illegal. So we need alias analysis here).
>
> *What GVNSink attempts to solve*
>
> The current implementation of GVNSink was meant to solve 1) and 2), but
> NOT 3). 3) wasn't within scope when I was writing GVNSink, but I think 3)
> is doable for some cases within the framework of GVNSink.
>
> *How GVNSink works*
>
> GVNSink works by computing a value number for each IR instruction, similar
> to GVN. However, GVNSink's value numbering works slightly from GVN's.
>
> To recap, the value number (VN) of an instruction is a function of the
> instruction. If VN[I] == VN[J], I and J can be considered equivalent, for
> some definition of equivalence.
>
> In GVN, VN[I] is based upon I and all of I's operands, transitively.
> Therefore if VN[I] == VN[J], I and J were computed in the same way and J
> can be used instead of I.
>
> In GVNSink, we flip this on its head. VN[I] is based upon I's *uses*,
> transitively. Therefore if VN[I] == VN[J], all of the uses of I are the
> same as all of the uses of J. Therefore I and J can be merged. So this:
>
>    [ %a = add %I, 1 ]   [ %c = add %J, 1 ]
>    [ %b = sub %a, 5 ]   [ %d = sub %c, 5 ]
>    [ ret %b ]           [ ret %d ]
>
> Can be transformed into this:
>
>    []     []
>       \ /
>    [ %IJ = phi %I, %J ]
>    [ %a = add %IJ, 1 ]
>    [ %b = sub %a, 5 ]
>    [ ret %b ]
>
> In GVNSink, we step backwards across all instructions in all incoming
> basic blocks (see LockstepReverseIterator). This produces, conceptually, a
> sequence of sets of instructions.
>
> Each of these sets is analyzed in turn (see analyzeInstructionForSinking)
> - the value number of each contained instruction is calculated and used to
> determine if the entire set is mergeable or not (all values have the same
> VN).
>
> If not all instructions in the set have the same VN, the set is pruned to
> just those with the same VN (or we stop if all instructions had different
> VNs). If this set passes extra legality checks, a
> SinkingInstructionCandidate is created. This contains pertinent data for
> the cost heuristic:
>
>   1) Number of instructions from each incoming block being sunk
>   2) Number of blocks to pick instructions from (this can be a subset of
> all the incoming blocks!)
>   3) How many PHIs would be generated in total
>
> After all candidates have been examined, they are sorted based on a
> heuristic and the best candidate picked and applied.
>
> The use of value numbering in this algorithm is what allows us to do this
> in somewhat-linear time with respect to the number of instructions
> inspected.
>
> Now, to solve 3), I think the component that needs swapping out is
> LockstepReverseIterator. This currently iterates backwards in lockstep
> across a set of blocks. To handle permutations, I think we could use a
> priority queue that feeds the main algorithm instructions from the input
> blocks in a slightly sorted order.
>
> For example, perhaps it might order sequences of stores by their target
> operand (assuming alias analysis says it can), ensuring sequences of stores
> in different blocks appear in the same order?
>
> Anyway, I hope this has helped understand a bit more about what GVNSink
> does, can do and how to modify it if you want to solve your problem.
>
> Cheers,
>
> James
>
>
>
> On Wed, 26 Apr 2017 at 18:16 Daniel Berlin <dberlin at dberlin.org> wrote:
>
>> I have a patch i'm working on to split newgvn into a proper analysis.
>> It works (and happy to share it), it's just the interface i'm using is a
>> little ugly (haven't decided what the right API is), but let me know what
>> GVNSink needs from life and i'll make sure it's there :)
>>
>>
>> On Wed, Apr 26, 2017 at 10:09 AM, James Molloy via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> It's basically ready to commit; the reviewers were fairly happy with it.
>>> It needs rebasing on top of NewGVN and any bugs that shakes out fixed, but
>>> that's about it.
>>>
>>> I want to get around to it soon-ish, but I've wanted that for a while!
>>>
>>> On Wed, 26 Apr 2017 at 16:50, Hongbin Zheng <etherzhhb at gmail.com> wrote:
>>>
>>>> Hi James,
>>>>
>>>> I have an ad-hoc solution in mind to solve this problem.
>>>> But if it can be solved under the framework of GVN"Sink", it is even
>>>> better.
>>>>
>>>> any plan on your https://reviews.llvm.org/D24805?
>>>>
>>>> Thanks
>>>> Hongbin
>>>>
>>>>
>>>> On Wed, Apr 26, 2017 at 2:13 AM, James Molloy <james at jamesmolloy.co.uk>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Yes, I can see why that would not work.
>>>>>
>>>>> The sinking algorithm in SimplifyCFG isn't particularly clever. In
>>>>> particular it can't reason about memory ordering and aliasing. In
>>>>> unswitch1(), it can identify that the stores correlate because the
>>>>> correlating stores appear in the same relative source order. In unswitch2()
>>>>> they have been permuted, and the algorithm cannot deal with this. This
>>>>> requires some kind of value numbering to do efficiently.
>>>>>
>>>>> The GVNSink pass that I really should get around to updating should
>>>>> solve this, eventually!
>>>>>
>>>>> James
>>>>>
>>>>>
>>>>> On Wed, 26 Apr 2017 at 08:19 Hongbin Zheng via llvm-dev <
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>>> Looks like this do not work:
>>>>>>
>>>>>> // Type your code here, or load an example.
>>>>>> int a[10];
>>>>>>
>>>>>> void unswitch2(int i, int x, int y0, int y1) {
>>>>>>   if (x) {
>>>>>>     a[i] = y0;
>>>>>>     a[i + 1] = y1;
>>>>>>   } else {
>>>>>>     a[i + 1] = y0;
>>>>>>     a[i] = y1;
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>> https://godbolt.org/g/Ldd5qV
>>>>>>
>>>>>> On Tue, Apr 25, 2017 at 10:22 PM, Hongbin Zheng <etherzhhb at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Looks like inst combine do the job
>>>>>>>
>>>>>>> On Tue, Apr 25, 2017 at 9:36 PM, Davide Italiano <davide at freebsd.org
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> On Tue, Apr 25, 2017 at 9:24 PM, Hongbin Zheng via llvm-dev
>>>>>>>> <llvm-dev at lists.llvm.org> wrote:
>>>>>>>> > Hi,
>>>>>>>> >
>>>>>>>> > Is there a pass in LLVM that can optimize:
>>>>>>>> >
>>>>>>>> > if (x)
>>>>>>>> >   a[i] = y0;
>>>>>>>> > else
>>>>>>>> >   a[i] = y1;
>>>>>>>> >
>>>>>>>> > to
>>>>>>>> >
>>>>>>>> > a[i] = x ? y0 : y1?
>>>>>>>> >
>>>>>>>> > I tried opt (3.9) with -O3 but looks like such an optimization do
>>>>>>>> not
>>>>>>>> > happened.
>>>>>>>> >
>>>>>>>>
>>>>>>>> The same IR at -O3 for both cases on this example.
>>>>>>>> https://godbolt.org/g/Tk2MM8
>>>>>>>>
>>>>>>>> --
>>>>>>>> Davide
>>>>>>>>
>>>>>>>> "There are no solved problems; there are only problems that are more
>>>>>>>> or less solved" -- Henri Poincare
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>>
>>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170428/157cc043/attachment-0001.html>