[llvm-dev] PBQP register allocation and copy propagation

Fri Jun 3 10:04:01 PDT 2016

Hi Sebastien,

> I think one idea to improve the situation is to consider the cost vector
of adjacent nodes during RN. Let's say you decided to do a RN for node A
and want to compute the costs for choosing register %Ri. The current
implementation does this by computing min(row/column i of edge A <--> B)
but you can do better by adding B's cost vector to the row/column before
computing the minimum. This way you also consider B's affinitiy for %R0.

Am I right in thinking that this extra calculation would take place during
the backpropagation phase?

I'm also interested as to whether such a calculation would be sufficient
when more copies are inserted (a chain of copies):

      COPY           COPY
A <--------> B <---------> C

If we reduce B last, and we only account for neighboring affinities at the
backpropagation phase, we will not get an optimal allocation. However if we
could somehow propagate neighboring affinities at the *reduction* stage,
perhaps we might get a transitivity property emerging that could solve the
above case optimally?

I'm very much a newbie in this area so be gentle on me :)

Cheers,

James

On Fri, 3 Jun 2016 at 15:33 Arnaud De Grandmaison <
Arnaud.DeGrandmaison at arm.com> wrote:

> Hi James,
>
>
>
> I’ve tried to play in the past with the allocation order, which can
> definitely be tweaked and improved. The metric we use for spill cost being
> what it is (i.e. not targeted for PBQP, but that’s a different subject), I
> found it made real sense to use some other heuristics to sort nodes (on top
> of the spill cost) when there was a tie between them.  Of course, that’s a
> heuristic and it can sometimes be wrong. Another option I tried was to
> ensure the nodes’ allocation order gets updated when some costs are
> propagated.
>
>
>
> I have also tried to implement something along the lines of what Sebastian
> suggested (i.e. considering the cost of adjacent nodes). This made the
> allocation better, but allocation time went very high --- but I had no time
> to try to improve the heuristic.
>
>
>
> I did not know about Sebastian et al paper, so thanks for pointing to it !
>
>
>
> Cheers,
>
> Arnaud
>
>
>
> *From:* Sebastian Buchwald [mailto:Sebastian.Buchwald at kit.edu]
> *Sent:* 02 June 2016 20:57
> *To:* llvm-dev at lists.llvm.org; james at jamesmolloy.co.uk; lhames at gmail.com;
> Arnaud De Grandmaison
> *Subject:* Re: [llvm-dev] PBQP register allocation and copy propagation
>
>
>
> On 06/02/2016 07:22 PM, James Molloy via llvm-dev wrote:
>
> Hi Lang and Arnaud,
>
>
>
> I've been testing out the PBQP allocator for Thumb-2 and have ran into a
> problem I'd love to get your input on.
>
>
>
> The problem is exemplfied in the codegen for the function @bar in the
> attached IR file:
>
>
>
> bar:
>
>         push    {r4, lr}
>
>         sub     sp, #12
>
>  (1)    movw    r2, :lower16:.L_MergedGlobals
>
>  (1)    movt    r2, :upper16:.L_MergedGlobals
>
>         ldm.w   r2, {r0, r1, r3, r12, lr}
>
>         ldrd    r4, r2, [r2, #20]
>
>         strd    lr, r4, [sp]
>
>         str     r2, [sp, #8]
>
>  (2)    mov     r2, r3         ****
>
>         mov     r3, r12        ****
>
>         bl      baz
>
>         add     sp, #12
>
>         pop     {r4, pc}
>
>
>
> The two moves marked with **** are unnecessary. Especially the first,
> which could be removed simply by swapping r2 and r3. This becomes even more
> pronounced when I teach PBQP about how best to use registers to allow
> LDM/STM formation; the codegen is perfect apart from those two moves that
> just won't go.
>
>
>
> I've discovered that the underlying problem is that line (1) is reduced
> after line (2). During the backpropagation phase (1) is allocated r2 which
> means (2) cannot, as they interfere.
>
>
>
> Interestingly they're both reduced by rule RN and as the spill cost is the
> same between them (and they're both conservatively allocatable), it's pure
> luck which gets allocated first. I have a patch to account for the
> opportunity cost of allocating r2 to (1) instead of (2) by assigning a cost
> to r2 in (1)'s affinity matrix; this seems to work. However I don't really
> know what's the right approach here - dealing with this problem purely by
> propagating opportunity costs through affinity matrices or doing something
> better with the reduction order.
>
>
>
> This ties in with another problem I'm seeing with a prototype live-range
> splitter for PBQP. Obviously when pre-splitting around every instruction,
> many copies are inserted. But the above problem rears its head again!
> Consider this:
>
>
>
> %vregB = COPY %R0
>
> %vregA = COPY %vregB
>
>
>
>       COPY
>
>   A <------> B
>
>
>
> Node B has an affinity for register R0. The affinity edge between A and B
> is a simple coalescing affinity (identity matrix). If B is assigned first,
> it will get R0 and then A will also select R0. However if A is selected
> first then it doesn't know anything about node B's affinities and could
> thus get any register, causing a MOV.
>
>
>
> The above trivial example would be reduced by rule R1, so would actually
> work optimally (because the reduction would account for B's costs in A).
> However it is trivial to add two or more interfering defs that cause the
> reduction rule to change to RN:
>
>
>
> %vregC = def...
>
> %vregD = def...
>
> %vregB = COPY %R0
>
> %vregA = COPY %vregB
>
>        = use %vregC, %vregD
>
> I think one idea to improve the situation is to consider the cost vector
> of adjacent nodes during RN. Let's say you decided to do a RN for node A
> and want to compute the costs for choosing register %Ri. The current
> implementation does this by computing min(row/column i of edge A <--> B)
> but you can do better by adding B's cost vector to the row/column before
> computing the minimum. This way you also consider B's affinitiy for %R0.
>
>
>
>
> Now the reduction order is arbitrary and local costs aren't propagated
> unlike R1 and R2. In fact, there is a rule "RM" proposed by Buchwald [1]
> that seeks to fix a very similar case to this.
>
> Porting RM to that situation would basically say "always fulfill this
> affinity" (beween A and B). Of course that's the wrong choice, if you
> really need the copy.
>
> Best regards,
> Sebastian
>
>
>
>
> What do you think?
>
>
>
> Cheers,
>
>
>
> James
>
>
>
> [1] SSA-based Register Allocation with PBQP, Buchwald et al,
> https://pp.info.uni-karlsruhe.de/uploads/publikationen/buchwald11cc.pdf
>
>
>
>
> _______________________________________________
>
> LLVM Developers mailing list
>
> llvm-dev at lists.llvm.org
>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160603/f3442634/attachment.html>