[llvm-dev] PBQP register allocation and copy propagation

Sat Jun 4 13:52:15 PDT 2016

Hi Lang,

Thanks for the reply! Quick reply on phone for now- to reproduce, make sure
-pbqp-coalescing is enabled.

James
On Sat, 4 Jun 2016 at 15:05, Arnaud De Grandmaison <
Arnaud.DeGrandmaison at arm.com> wrote:

> > (1) Spill cost metrics.
>
>
>
> I think this one cannot be dissociated from some kind of live range
> pre-splitting.
>
>
>
> Cheers,
>
> Arnaud
>
>
>
> *From:* Lang Hames [mailto:lhames at gmail.com]
> *Sent:* 03 June 2016 23:15
> *To:* James Molloy
> *Cc:* Arnaud De Grandmaison; llvm-dev at lists.llvm.org; Sebastian Buchwald;
> Bernhard Scholz
>
>
> *Subject:* Re: [llvm-dev] PBQP register allocation and copy propagation
>
>
>
> Hi James, Arnaud, Sebastian,
>
>
>
> I think one idea to improve the situation is to consider the cost vector
> of adjacent nodes during RN. Let's say you decided to do a RN for node A
> and want to compute the costs for choosing register %Ri. The current
> implementation does this by computing min(row/column i of edge A <--> B)
> but you can do better by adding B's cost vector to the row/column before
> computing the minimum. This way you also consider B's affinitiy for %R0.
>
>
>
> The current solver does not use an "RN" rule as such (certainly nothing
> comparable to what was discussed in the 2002 Scholz/Eckstein paper). We
> have a heuristic rule for selecting nodes for reduction, but we do not
> propagate costs from heuristically reduced nodes until backpropagation.
>
>
>
> In fact, it seems even worse than this; if RN is used, then all neighbors
> are disconnected. So during backpropagation, unless I'm missing something,
> it seems we will only ever consider the node cost, never adding in any edge
> cost.
>
>
>
> I think you're right, but just to be precise: In the current solver any
> heuristically reduced node is colored during backpropagation taking into
> account the costs for nodes further up the stack (i.e. ones that have been
> colored earlier during backpropagation) only. Nodes further down in the
> stack are not considered.
>
>
>
> The metric we use for spill cost being what it is (i.e. not targeted for
> PBQP, but that’s a different subject)...
>
>
>
> Yes - this is also a very interesting subject.
>
>
>
> So: There are four things that we can tweak that I know of:
>
>
>
> (1) Spill cost metrics.
>
> (2) Heuristic node *selection*. (Currently "provably colorable nodes
> first (in no particular order as far as I remember), possibly uncolorable
> nodes last in order of spill cost")
>
> (3) Speculative assignment during heuristic reduction - propagate costs
> onto neighbors of the RN node based on a localized guess at the best
> solution.
>
> (4) Look-ahead during backpropagation.
>
>
>
> All of these are worth looking at, but I'd be inclined to look at (2)
> (maybe we should reduce nodes with many affinity edges later, so that
> they're colored earlier?) and (4) (a user-configurable lookahead would let
> clients trade compile-time for allocation quality smoothly).
>
>
>
> James - how can I reproduce your test case? I tried with r271685 using
> 'llc -O3 -regalloc=pbqp zia.ll', but I got:
>
>
>
> bar:
>
>         .fnstart
>
> @ BB#0:
>
>         .save   {r0, r1, r2, r3, r4, lr}
>
>         push    {r0, r1, r2, r3, r4, lr}
>
>         ldr     r0, .LCPI0_0
>
>         ldm.w   r0, {r1, r2, r3, r12, lr}
>
>         ldrd    r4, r0, [r0, #20]
>
>         strd    lr, r4, [sp]
>
>         str     r0, [sp, #8]
>
>         mov     r0, r1
>
>         mov     r1, r2
>
>         mov     r2, r3
>
>         mov     r3, r12
>
>         bl      baz
>
>         pop     {r0, r1, r2, r3, r4, pc}
>
>         .p2align        2
>
>
>
> which has even more registers used.
>
>
>
> Cheers,
>
> Lang.
>
>
>
>
>
> On Fri, Jun 3, 2016 at 11:09 AM, James Molloy <james at jamesmolloy.co.uk>
> wrote:
>
> Hi,
>
>
>
> > > I think one idea to improve the situation is to consider the cost
> vector of adjacent nodes during RN. Let's say you decided to do a RN for
> node A and want to compute the costs for choosing register %Ri. The current
> implementation does this by computing min(row/column i of edge A <--> B)
> but you can do better by adding B's cost vector to the row/column before
> computing the minimum. This way you also consider B's affinitiy for %R0.
>
>
>
> In fact, it seems even worse than this; if RN is used, then all neighbors
> are disconnected. So during backpropagation, unless I'm missing something,
> it seems we will only ever consider the node cost, never adding in any edge
> cost.
>
>
>
> On Fri, 3 Jun 2016 at 19:04 James Molloy <james at jamesmolloy.co.uk> wrote:
>
> Hi Sebastien,
>
>
>
> > I think one idea to improve the situation is to consider the cost vector
> of adjacent nodes during RN. Let's say you decided to do a RN for node A
> and want to compute the costs for choosing register %Ri. The current
> implementation does this by computing min(row/column i of edge A <--> B)
> but you can do better by adding B's cost vector to the row/column before
> computing the minimum. This way you also consider B's affinitiy for %R0.
>
>
>
> Am I right in thinking that this extra calculation would take place during
> the backpropagation phase?
>
>
>
> I'm also interested as to whether such a calculation would be sufficient
> when more copies are inserted (a chain of copies):
>
>
>
>       COPY           COPY
>
> A <--------> B <---------> C
>
>
>
> If we reduce B last, and we only account for neighboring affinities at the
> backpropagation phase, we will not get an optimal allocation. However if we
> could somehow propagate neighboring affinities at the *reduction* stage,
> perhaps we might get a transitivity property emerging that could solve the
> above case optimally?
>
>
>
> I'm very much a newbie in this area so be gentle on me :)
>
>
>
> Cheers,
>
>
>
> James
>
>
>
> On Fri, 3 Jun 2016 at 15:33 Arnaud De Grandmaison <
> Arnaud.DeGrandmaison at arm.com> wrote:
>
> Hi James,
>
>
>
> I’ve tried to play in the past with the allocation order, which can
> definitely be tweaked and improved. The metric we use for spill cost being
> what it is (i.e. not targeted for PBQP, but that’s a different subject), I
> found it made real sense to use some other heuristics to sort nodes (on top
> of the spill cost) when there was a tie between them.  Of course, that’s a
> heuristic and it can sometimes be wrong. Another option I tried was to
> ensure the nodes’ allocation order gets updated when some costs are
> propagated.
>
>
>
> I have also tried to implement something along the lines of what Sebastian
> suggested (i.e. considering the cost of adjacent nodes). This made the
> allocation better, but allocation time went very high --- but I had no time
> to try to improve the heuristic.
>
>
>
> I did not know about Sebastian et al paper, so thanks for pointing to it !
>
>
>
> Cheers,
>
> Arnaud
>
>
>
> *From:* Sebastian Buchwald [mailto:Sebastian.Buchwald at kit.edu]
> *Sent:* 02 June 2016 20:57
> *To:* llvm-dev at lists.llvm.org; james at jamesmolloy.co.uk; lhames at gmail.com;
> Arnaud De Grandmaison
> *Subject:* Re: [llvm-dev] PBQP register allocation and copy propagation
>
>
>
> On 06/02/2016 07:22 PM, James Molloy via llvm-dev wrote:
>
> Hi Lang and Arnaud,
>
>
>
> I've been testing out the PBQP allocator for Thumb-2 and have ran into a
> problem I'd love to get your input on.
>
>
>
> The problem is exemplfied in the codegen for the function @bar in the
> attached IR file:
>
>
>
> bar:
>
>         push    {r4, lr}
>
>         sub     sp, #12
>
>  (1)    movw    r2, :lower16:.L_MergedGlobals
>
>  (1)    movt    r2, :upper16:.L_MergedGlobals
>
>         ldm.w   r2, {r0, r1, r3, r12, lr}
>
>         ldrd    r4, r2, [r2, #20]
>
>         strd    lr, r4, [sp]
>
>         str     r2, [sp, #8]
>
>  (2)    mov     r2, r3         ****
>
>         mov     r3, r12        ****
>
>         bl      baz
>
>         add     sp, #12
>
>         pop     {r4, pc}
>
>
>
> The two moves marked with **** are unnecessary. Especially the first,
> which could be removed simply by swapping r2 and r3. This becomes even more
> pronounced when I teach PBQP about how best to use registers to allow
> LDM/STM formation; the codegen is perfect apart from those two moves that
> just won't go.
>
>
>
> I've discovered that the underlying problem is that line (1) is reduced
> after line (2). During the backpropagation phase (1) is allocated r2 which
> means (2) cannot, as they interfere.
>
>
>
> Interestingly they're both reduced by rule RN and as the spill cost is the
> same between them (and they're both conservatively allocatable), it's pure
> luck which gets allocated first. I have a patch to account for the
> opportunity cost of allocating r2 to (1) instead of (2) by assigning a cost
> to r2 in (1)'s affinity matrix; this seems to work. However I don't really
> know what's the right approach here - dealing with this problem purely by
> propagating opportunity costs through affinity matrices or doing something
> better with the reduction order.
>
>
>
> This ties in with another problem I'm seeing with a prototype live-range
> splitter for PBQP. Obviously when pre-splitting around every instruction,
> many copies are inserted. But the above problem rears its head again!
> Consider this:
>
>
>
> %vregB = COPY %R0
>
> %vregA = COPY %vregB
>
>
>
>       COPY
>
>   A <------> B
>
>
>
> Node B has an affinity for register R0. The affinity edge between A and B
> is a simple coalescing affinity (identity matrix). If B is assigned first,
> it will get R0 and then A will also select R0. However if A is selected
> first then it doesn't know anything about node B's affinities and could
> thus get any register, causing a MOV.
>
>
>
> The above trivial example would be reduced by rule R1, so would actually
> work optimally (because the reduction would account for B's costs in A).
> However it is trivial to add two or more interfering defs that cause the
> reduction rule to change to RN:
>
>
>
> %vregC = def...
>
> %vregD = def...
>
> %vregB = COPY %R0
>
> %vregA = COPY %vregB
>
>        = use %vregC, %vregD
>
> I think one idea to improve the situation is to consider the cost vector
> of adjacent nodes during RN. Let's say you decided to do a RN for node A
> and want to compute the costs for choosing register %Ri. The current
> implementation does this by computing min(row/column i of edge A <--> B)
> but you can do better by adding B's cost vector to the row/column before
> computing the minimum. This way you also consider B's affinitiy for %R0.
>
>
>
> Now the reduction order is arbitrary and local costs aren't propagated
> unlike R1 and R2. In fact, there is a rule "RM" proposed by Buchwald [1]
> that seeks to fix a very similar case to this.
>
> Porting RM to that situation would basically say "always fulfill this
> affinity" (beween A and B). Of course that's the wrong choice, if you
> really need the copy.
>
> Best regards,
> Sebastian
>
>
>
> What do you think?
>
>
>
> Cheers,
>
>
>
> James
>
>
>
> [1] SSA-based Register Allocation with PBQP, Buchwald et al,
> https://pp.info.uni-karlsruhe.de/uploads/publikationen/buchwald11cc.pdf
>
>
>
> _______________________________________________
>
> LLVM Developers mailing list
>
> llvm-dev at lists.llvm.org
>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
>
>
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160604/14a5f596/attachment.html>