[llvm-dev] PBQP register allocation and copy propagation

Tue Jun 7 10:53:09 PDT 2016

>
>  I think this one cannot be dissociated from some kind of live range
> pre-splitting.

You're right - these things are closely related. They're also connected to
the coloring problems James has identified. If we can do a better job at
exploiting coalescing opportunities we can afford to pre-split more
aggressively, knowing that the allocator can clean up after us.

Thanks for the reply! Quick reply on phone for now- to reproduce, make sure
> -pbqp-coalescing is enabled.

Thanks James!

I'm pretty busy at the moment, but hopefully I'll be able to take a look at
this in the next day or so.

Cheers,
Lang.

On Sat, Jun 4, 2016 at 1:52 PM, James Molloy <james at jamesmolloy.co.uk>
wrote:

> Hi Lang,
>
> Thanks for the reply! Quick reply on phone for now- to reproduce, make
> sure -pbqp-coalescing is enabled.
>
> James
>
> On Sat, 4 Jun 2016 at 15:05, Arnaud De Grandmaison <
> Arnaud.DeGrandmaison at arm.com> wrote:
>
>> > (1) Spill cost metrics.
>>
>>
>>
>> I think this one cannot be dissociated from some kind of live range
>> pre-splitting.
>>
>>
>>
>> Cheers,
>>
>> Arnaud
>>
>>
>>
>> *From:* Lang Hames [mailto:lhames at gmail.com]
>> *Sent:* 03 June 2016 23:15
>> *To:* James Molloy
>> *Cc:* Arnaud De Grandmaison; llvm-dev at lists.llvm.org; Sebastian
>> Buchwald; Bernhard Scholz
>>
>>
>> *Subject:* Re: [llvm-dev] PBQP register allocation and copy propagation
>>
>>
>>
>> Hi James, Arnaud, Sebastian,
>>
>>
>>
>> I think one idea to improve the situation is to consider the cost vector
>> of adjacent nodes during RN. Let's say you decided to do a RN for node A
>> and want to compute the costs for choosing register %Ri. The current
>> implementation does this by computing min(row/column i of edge A <--> B)
>> but you can do better by adding B's cost vector to the row/column before
>> computing the minimum. This way you also consider B's affinitiy for %R0.
>>
>>
>>
>> The current solver does not use an "RN" rule as such (certainly nothing
>> comparable to what was discussed in the 2002 Scholz/Eckstein paper). We
>> have a heuristic rule for selecting nodes for reduction, but we do not
>> propagate costs from heuristically reduced nodes until backpropagation.
>>
>>
>>
>> In fact, it seems even worse than this; if RN is used, then all neighbors
>> are disconnected. So during backpropagation, unless I'm missing something,
>> it seems we will only ever consider the node cost, never adding in any edge
>> cost.
>>
>>
>>
>> I think you're right, but just to be precise: In the current solver any
>> heuristically reduced node is colored during backpropagation taking into
>> account the costs for nodes further up the stack (i.e. ones that have been
>> colored earlier during backpropagation) only. Nodes further down in the
>> stack are not considered.
>>
>>
>>
>> The metric we use for spill cost being what it is (i.e. not targeted for
>> PBQP, but that’s a different subject)...
>>
>>
>>
>> Yes - this is also a very interesting subject.
>>
>>
>>
>> So: There are four things that we can tweak that I know of:
>>
>>
>>
>> (1) Spill cost metrics.
>>
>> (2) Heuristic node *selection*. (Currently "provably colorable nodes
>> first (in no particular order as far as I remember), possibly uncolorable
>> nodes last in order of spill cost")
>>
>> (3) Speculative assignment during heuristic reduction - propagate costs
>> onto neighbors of the RN node based on a localized guess at the best
>> solution.
>>
>> (4) Look-ahead during backpropagation.
>>
>>
>>
>> All of these are worth looking at, but I'd be inclined to look at (2)
>> (maybe we should reduce nodes with many affinity edges later, so that
>> they're colored earlier?) and (4) (a user-configurable lookahead would let
>> clients trade compile-time for allocation quality smoothly).
>>
>>
>>
>> James - how can I reproduce your test case? I tried with r271685 using
>> 'llc -O3 -regalloc=pbqp zia.ll', but I got:
>>
>>
>>
>> bar:
>>
>>         .fnstart
>>
>> @ BB#0:
>>
>>         .save   {r0, r1, r2, r3, r4, lr}
>>
>>         push    {r0, r1, r2, r3, r4, lr}
>>
>>         ldr     r0, .LCPI0_0
>>
>>         ldm.w   r0, {r1, r2, r3, r12, lr}
>>
>>         ldrd    r4, r0, [r0, #20]
>>
>>         strd    lr, r4, [sp]
>>
>>         str     r0, [sp, #8]
>>
>>         mov     r0, r1
>>
>>         mov     r1, r2
>>
>>         mov     r2, r3
>>
>>         mov     r3, r12
>>
>>         bl      baz
>>
>>         pop     {r0, r1, r2, r3, r4, pc}
>>
>>         .p2align        2
>>
>>
>>
>> which has even more registers used.
>>
>>
>>
>> Cheers,
>>
>> Lang.
>>
>>
>>
>>
>>
>> On Fri, Jun 3, 2016 at 11:09 AM, James Molloy <james at jamesmolloy.co.uk>
>> wrote:
>>
>> Hi,
>>
>>
>>
>> > > I think one idea to improve the situation is to consider the cost
>> vector of adjacent nodes during RN. Let's say you decided to do a RN for
>> node A and want to compute the costs for choosing register %Ri. The current
>> implementation does this by computing min(row/column i of edge A <--> B)
>> but you can do better by adding B's cost vector to the row/column before
>> computing the minimum. This way you also consider B's affinitiy for %R0.
>>
>>
>>
>> In fact, it seems even worse than this; if RN is used, then all neighbors
>> are disconnected. So during backpropagation, unless I'm missing something,
>> it seems we will only ever consider the node cost, never adding in any edge
>> cost.
>>
>>
>>
>> On Fri, 3 Jun 2016 at 19:04 James Molloy <james at jamesmolloy.co.uk> wrote:
>>
>> Hi Sebastien,
>>
>>
>>
>> > I think one idea to improve the situation is to consider the cost
>> vector of adjacent nodes during RN. Let's say you decided to do a RN for
>> node A and want to compute the costs for choosing register %Ri. The current
>> implementation does this by computing min(row/column i of edge A <--> B)
>> but you can do better by adding B's cost vector to the row/column before
>> computing the minimum. This way you also consider B's affinitiy for %R0.
>>
>>
>>
>> Am I right in thinking that this extra calculation would take place
>> during the backpropagation phase?
>>
>>
>>
>> I'm also interested as to whether such a calculation would be sufficient
>> when more copies are inserted (a chain of copies):
>>
>>
>>
>>       COPY           COPY
>>
>> A <--------> B <---------> C
>>
>>
>>
>> If we reduce B last, and we only account for neighboring affinities at
>> the backpropagation phase, we will not get an optimal allocation. However
>> if we could somehow propagate neighboring affinities at the *reduction*
>> stage, perhaps we might get a transitivity property emerging that could
>> solve the above case optimally?
>>
>>
>>
>> I'm very much a newbie in this area so be gentle on me :)
>>
>>
>>
>> Cheers,
>>
>>
>>
>> James
>>
>>
>>
>> On Fri, 3 Jun 2016 at 15:33 Arnaud De Grandmaison <
>> Arnaud.DeGrandmaison at arm.com> wrote:
>>
>> Hi James,
>>
>>
>>
>> I’ve tried to play in the past with the allocation order, which can
>> definitely be tweaked and improved. The metric we use for spill cost being
>> what it is (i.e. not targeted for PBQP, but that’s a different subject), I
>> found it made real sense to use some other heuristics to sort nodes (on top
>> of the spill cost) when there was a tie between them.  Of course, that’s a
>> heuristic and it can sometimes be wrong. Another option I tried was to
>> ensure the nodes’ allocation order gets updated when some costs are
>> propagated.
>>
>>
>>
>> I have also tried to implement something along the lines of what
>> Sebastian suggested (i.e. considering the cost of adjacent nodes). This
>> made the allocation better, but allocation time went very high --- but I
>> had no time to try to improve the heuristic.
>>
>>
>>
>> I did not know about Sebastian et al paper, so thanks for pointing to it
>> !
>>
>>
>>
>> Cheers,
>>
>> Arnaud
>>
>>
>>
>> *From:* Sebastian Buchwald [mailto:Sebastian.Buchwald at kit.edu]
>> *Sent:* 02 June 2016 20:57
>> *To:* llvm-dev at lists.llvm.org; james at jamesmolloy.co.uk; lhames at gmail.com;
>> Arnaud De Grandmaison
>> *Subject:* Re: [llvm-dev] PBQP register allocation and copy propagation
>>
>>
>>
>> On 06/02/2016 07:22 PM, James Molloy via llvm-dev wrote:
>>
>> Hi Lang and Arnaud,
>>
>>
>>
>> I've been testing out the PBQP allocator for Thumb-2 and have ran into a
>> problem I'd love to get your input on.
>>
>>
>>
>> The problem is exemplfied in the codegen for the function @bar in the
>> attached IR file:
>>
>>
>>
>> bar:
>>
>>         push    {r4, lr}
>>
>>         sub     sp, #12
>>
>>  (1)    movw    r2, :lower16:.L_MergedGlobals
>>
>>  (1)    movt    r2, :upper16:.L_MergedGlobals
>>
>>         ldm.w   r2, {r0, r1, r3, r12, lr}
>>
>>         ldrd    r4, r2, [r2, #20]
>>
>>         strd    lr, r4, [sp]
>>
>>         str     r2, [sp, #8]
>>
>>  (2)    mov     r2, r3         ****
>>
>>         mov     r3, r12        ****
>>
>>         bl      baz
>>
>>         add     sp, #12
>>
>>         pop     {r4, pc}
>>
>>
>>
>> The two moves marked with **** are unnecessary. Especially the first,
>> which could be removed simply by swapping r2 and r3. This becomes even more
>> pronounced when I teach PBQP about how best to use registers to allow
>> LDM/STM formation; the codegen is perfect apart from those two moves that
>> just won't go.
>>
>>
>>
>> I've discovered that the underlying problem is that line (1) is reduced
>> after line (2). During the backpropagation phase (1) is allocated r2 which
>> means (2) cannot, as they interfere.
>>
>>
>>
>> Interestingly they're both reduced by rule RN and as the spill cost is
>> the same between them (and they're both conservatively allocatable), it's
>> pure luck which gets allocated first. I have a patch to account for the
>> opportunity cost of allocating r2 to (1) instead of (2) by assigning a cost
>> to r2 in (1)'s affinity matrix; this seems to work. However I don't really
>> know what's the right approach here - dealing with this problem purely by
>> propagating opportunity costs through affinity matrices or doing something
>> better with the reduction order.
>>
>>
>>
>> This ties in with another problem I'm seeing with a prototype live-range
>> splitter for PBQP. Obviously when pre-splitting around every instruction,
>> many copies are inserted. But the above problem rears its head again!
>> Consider this:
>>
>>
>>
>> %vregB = COPY %R0
>>
>> %vregA = COPY %vregB
>>
>>
>>
>>       COPY
>>
>>   A <------> B
>>
>>
>>
>> Node B has an affinity for register R0. The affinity edge between A and B
>> is a simple coalescing affinity (identity matrix). If B is assigned first,
>> it will get R0 and then A will also select R0. However if A is selected
>> first then it doesn't know anything about node B's affinities and could
>> thus get any register, causing a MOV.
>>
>>
>>
>> The above trivial example would be reduced by rule R1, so would actually
>> work optimally (because the reduction would account for B's costs in A).
>> However it is trivial to add two or more interfering defs that cause the
>> reduction rule to change to RN:
>>
>>
>>
>> %vregC = def...
>>
>> %vregD = def...
>>
>> %vregB = COPY %R0
>>
>> %vregA = COPY %vregB
>>
>>        = use %vregC, %vregD
>>
>> I think one idea to improve the situation is to consider the cost vector
>> of adjacent nodes during RN. Let's say you decided to do a RN for node A
>> and want to compute the costs for choosing register %Ri. The current
>> implementation does this by computing min(row/column i of edge A <--> B)
>> but you can do better by adding B's cost vector to the row/column before
>> computing the minimum. This way you also consider B's affinitiy for %R0.
>>
>>
>>
>> Now the reduction order is arbitrary and local costs aren't propagated
>> unlike R1 and R2. In fact, there is a rule "RM" proposed by Buchwald [1]
>> that seeks to fix a very similar case to this.
>>
>> Porting RM to that situation would basically say "always fulfill this
>> affinity" (beween A and B). Of course that's the wrong choice, if you
>> really need the copy.
>>
>> Best regards,
>> Sebastian
>>
>>
>>
>> What do you think?
>>
>>
>>
>> Cheers,
>>
>>
>>
>> James
>>
>>
>>
>> [1] SSA-based Register Allocation with PBQP, Buchwald et al,
>> https://pp.info.uni-karlsruhe.de/uploads/publikationen/buchwald11cc.pdf
>>
>>
>>
>> _______________________________________________
>>
>> LLVM Developers mailing list
>>
>> llvm-dev at lists.llvm.org
>>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>>
>> IMPORTANT NOTICE: The contents of this email and any attachments are
>> confidential and may also be privileged. If you are not the intended
>> recipient, please notify the sender immediately and do not disclose the
>> contents to any other person, use it for any purpose, or store or copy the
>> information in any medium. Thank you.
>>
>>
>> IMPORTANT NOTICE: The contents of this email and any attachments are
>> confidential and may also be privileged. If you are not the intended
>> recipient, please notify the sender immediately and do not disclose the
>> contents to any other person, use it for any purpose, or store or copy the
>> information in any medium. Thank you.
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160607/9887e3be/attachment-0001.html>