# [llvm-dev] PBQP register allocation and copy propagation

James Molloy via llvm-dev llvm-dev at lists.llvm.org
Fri Jun 3 11:09:50 PDT 2016

```Hi,

> > I think one idea to improve the situation is to consider the cost
vector of adjacent nodes during RN. Let's say you decided to do a RN for
node A and want to compute the costs for choosing register %Ri. The current
implementation does this by computing min(row/column i of edge A <--> B)
but you can do better by adding B's cost vector to the row/column before
computing the minimum. This way you also consider B's affinitiy for %R0.

In fact, it seems even worse than this; if RN is used, then all neighbors
are disconnected. So during backpropagation, unless I'm missing something,
it seems we will only ever consider the node cost, never adding in any edge
cost.

On Fri, 3 Jun 2016 at 19:04 James Molloy <james at jamesmolloy.co.uk> wrote:

> Hi Sebastien,
>
> > I think one idea to improve the situation is to consider the cost vector
> of adjacent nodes during RN. Let's say you decided to do a RN for node A
> and want to compute the costs for choosing register %Ri. The current
> implementation does this by computing min(row/column i of edge A <--> B)
> but you can do better by adding B's cost vector to the row/column before
> computing the minimum. This way you also consider B's affinitiy for %R0.
>
> Am I right in thinking that this extra calculation would take place during
> the backpropagation phase?
>
> I'm also interested as to whether such a calculation would be sufficient
> when more copies are inserted (a chain of copies):
>
>       COPY           COPY
> A <--------> B <---------> C
>
> If we reduce B last, and we only account for neighboring affinities at the
> backpropagation phase, we will not get an optimal allocation. However if we
> could somehow propagate neighboring affinities at the *reduction* stage,
> perhaps we might get a transitivity property emerging that could solve the
> above case optimally?
>
> I'm very much a newbie in this area so be gentle on me :)
>
> Cheers,
>
> James
>
> On Fri, 3 Jun 2016 at 15:33 Arnaud De Grandmaison <
> Arnaud.DeGrandmaison at arm.com> wrote:
>
>> Hi James,
>>
>>
>>
>> I’ve tried to play in the past with the allocation order, which can
>> definitely be tweaked and improved. The metric we use for spill cost being
>> what it is (i.e. not targeted for PBQP, but that’s a different subject), I
>> found it made real sense to use some other heuristics to sort nodes (on top
>> of the spill cost) when there was a tie between them.  Of course, that’s a
>> heuristic and it can sometimes be wrong. Another option I tried was to
>> ensure the nodes’ allocation order gets updated when some costs are
>> propagated.
>>
>>
>>
>> I have also tried to implement something along the lines of what
>> Sebastian suggested (i.e. considering the cost of adjacent nodes). This
>> made the allocation better, but allocation time went very high --- but I
>> had no time to try to improve the heuristic.
>>
>>
>>
>> I did not know about Sebastian et al paper, so thanks for pointing to it
>> !
>>
>>
>>
>> Cheers,
>>
>> Arnaud
>>
>>
>>
>> *From:* Sebastian Buchwald [mailto:Sebastian.Buchwald at kit.edu]
>> *Sent:* 02 June 2016 20:57
>> *To:* llvm-dev at lists.llvm.org; james at jamesmolloy.co.uk; lhames at gmail.com;
>> Arnaud De Grandmaison
>> *Subject:* Re: [llvm-dev] PBQP register allocation and copy propagation
>>
>>
>>
>> On 06/02/2016 07:22 PM, James Molloy via llvm-dev wrote:
>>
>> Hi Lang and Arnaud,
>>
>>
>>
>> I've been testing out the PBQP allocator for Thumb-2 and have ran into a
>> problem I'd love to get your input on.
>>
>>
>>
>> The problem is exemplfied in the codegen for the function @bar in the
>> attached IR file:
>>
>>
>>
>> bar:
>>
>>         push    {r4, lr}
>>
>>         sub     sp, #12
>>
>>  (1)    movw    r2, :lower16:.L_MergedGlobals
>>
>>  (1)    movt    r2, :upper16:.L_MergedGlobals
>>
>>         ldm.w   r2, {r0, r1, r3, r12, lr}
>>
>>         ldrd    r4, r2, [r2, #20]
>>
>>         strd    lr, r4, [sp]
>>
>>         str     r2, [sp, #8]
>>
>>  (2)    mov     r2, r3         ****
>>
>>         mov     r3, r12        ****
>>
>>         bl      baz
>>
>>
>>         pop     {r4, pc}
>>
>>
>>
>> The two moves marked with **** are unnecessary. Especially the first,
>> which could be removed simply by swapping r2 and r3. This becomes even more
>> pronounced when I teach PBQP about how best to use registers to allow
>> LDM/STM formation; the codegen is perfect apart from those two moves that
>> just won't go.
>>
>>
>>
>> I've discovered that the underlying problem is that line (1) is reduced
>> after line (2). During the backpropagation phase (1) is allocated r2 which
>> means (2) cannot, as they interfere.
>>
>>
>>
>> Interestingly they're both reduced by rule RN and as the spill cost is
>> the same between them (and they're both conservatively allocatable), it's
>> pure luck which gets allocated first. I have a patch to account for the
>> opportunity cost of allocating r2 to (1) instead of (2) by assigning a cost
>> to r2 in (1)'s affinity matrix; this seems to work. However I don't really
>> know what's the right approach here - dealing with this problem purely by
>> propagating opportunity costs through affinity matrices or doing something
>> better with the reduction order.
>>
>>
>>
>> This ties in with another problem I'm seeing with a prototype live-range
>> splitter for PBQP. Obviously when pre-splitting around every instruction,
>> many copies are inserted. But the above problem rears its head again!
>> Consider this:
>>
>>
>>
>> %vregB = COPY %R0
>>
>> %vregA = COPY %vregB
>>
>>
>>
>>       COPY
>>
>>   A <------> B
>>
>>
>>
>> Node B has an affinity for register R0. The affinity edge between A and B
>> is a simple coalescing affinity (identity matrix). If B is assigned first,
>> it will get R0 and then A will also select R0. However if A is selected
>> first then it doesn't know anything about node B's affinities and could
>> thus get any register, causing a MOV.
>>
>>
>>
>> The above trivial example would be reduced by rule R1, so would actually
>> work optimally (because the reduction would account for B's costs in A).
>> However it is trivial to add two or more interfering defs that cause the
>> reduction rule to change to RN:
>>
>>
>>
>> %vregC = def...
>>
>> %vregD = def...
>>
>> %vregB = COPY %R0
>>
>> %vregA = COPY %vregB
>>
>>        = use %vregC, %vregD
>>
>> I think one idea to improve the situation is to consider the cost vector
>> of adjacent nodes during RN. Let's say you decided to do a RN for node A
>> and want to compute the costs for choosing register %Ri. The current
>> implementation does this by computing min(row/column i of edge A <--> B)
>> but you can do better by adding B's cost vector to the row/column before
>> computing the minimum. This way you also consider B's affinitiy for %R0.
>>
>>
>>
>>
>> Now the reduction order is arbitrary and local costs aren't propagated
>> unlike R1 and R2. In fact, there is a rule "RM" proposed by Buchwald [1]
>> that seeks to fix a very similar case to this.
>>
>> Porting RM to that situation would basically say "always fulfill this
>> affinity" (beween A and B). Of course that's the wrong choice, if you
>> really need the copy.
>>
>> Best regards,
>> Sebastian
>>
>>
>>
>>
>> What do you think?
>>
>>
>>
>> Cheers,
>>
>>
>>
>> James
>>
>>
>>
>> [1] SSA-based Register Allocation with PBQP, Buchwald et al,
>>
>>
>>
>>
>> _______________________________________________
>>
>> LLVM Developers mailing list
>>
>> llvm-dev at lists.llvm.org
>>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>> IMPORTANT NOTICE: The contents of this email and any attachments are
>> confidential and may also be privileged. If you are not the intended
>> recipient, please notify the sender immediately and do not disclose the
>> contents to any other person, use it for any purpose, or store or copy the
>> information in any medium. Thank you.
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...