[llvm-dev] Improving the split heuristics for the Greedy Register Allocator

Wei Mi via llvm-dev llvm-dev at lists.llvm.org
Fri Feb 10 15:05:00 PST 2017


I replied my finding in https://reviews.llvm.org/D27366 since it is
easier to get feedback from people follows it.

Thanks,
Wei.


On Thu, Feb 9, 2017 at 9:22 PM, Wei Mi <wmi at google.com> wrote:
> Hi Nemanja,
>
> Thanks for the testcase. I can reproduce your problem. splitting
> actually works as we expect but tryHintRecoloring coalesces the
> splitted vregs and revokes the effect of splitting. The simple way is
> to stop tryHintRecoloring for vregs colored with CSR registers, but I
> am still thinking if there is better way to solve it. Will keep you
> updated.
>
> Thanks,
> Wei.
>
>
> On Wed, Feb 8, 2017 at 10:17 PM, Nemanja Ivanovic
> <nemanja.i.ibm at gmail.com> wrote:
>> Hmm... it would appear that the behaviour on that original test case changes
>> with ToT. However, this test case will still allocate a CSR in the entry
>> block even though it really does not need to. And adjusting the CSR
>> first-time-use cost does not seem to have any effect on it.
>>
>> int *a;
>> int callVoid();
>> int callNonVoid(int*);
>> int test(int *b) {
>>   if (b == a) {
>>     callVoid();
>>     return callNonVoid(b);
>>   }
>>   return 0;
>> }
>>
>>
>> On Thu, Feb 9, 2017 at 5:15 AM, Wei Mi <wmi at google.com> wrote:
>>>
>>> On Wed, Feb 8, 2017 at 6:21 PM, Wei Mi <wmi at google.com> wrote:
>>> > I have an issue that I've been wrestling with for quite some time and
>>> > I'm
>>> > hoping that someone with a deeper understanding of the register
>>> > allocator
>>> > can help me with.
>>> >
>>> > Namely, I am trying to teach RA to split a live range rather than
>>> > allocating a CSR. I've attempted a very large number of tweaks to the
>>> > costs
>>> > (both existing and experimental ones that I've added). However, despite
>>> > all
>>> > of that, I can't seem to get RA to split the following:
>>> >
>>> >   1 BB#0: derived from LLVM BB %entry
>>> >   2     Live Ins: %X3
>>> >   3         %vreg15<def> = COPY %X3; G8RC:%vreg15
>>> >   4         %vreg4<def> = CMPLDI %vreg15, 0; CRRC:%vreg4 G8RC:%vreg15
>>> >   5         %vreg11:sub_32<def,read-undef> = LI 0; G8RC:%vreg11
>>> >   6         BCC 68, %vreg4, <BB#1>; CRRC:%vreg4
>>> >   7     Successors according to CFG: BB#4(0x30000000 / 0x80000000 =
>>> > 37.50%)
>>> > BB#1(0x50000000 / 0x80000000 = 62.50%)
>>> >   8
>>> >   9 BB#4:
>>> >  10     Predecessors according to CFG: BB#0
>>> >  11         B <BB#3>
>>> >  12     Successors according to CFG: BB#3(?%)
>>> >  13
>>> >  14 BB#1: derived from LLVM BB %if.end
>>> >  15     Predecessors according to CFG: BB#0
>>> >  16         %vreg6<def> = ADDIStocHA %X2, <ga:@a>;
>>> > G8RC_and_G8RC_NOX0:%vreg6
>>> >  17         %vreg7<def> = LDtocL <ga:@a>, %vreg6, %X2<imp-use>;
>>> > mem:LD8[GOT] G8RC_and_G8RC_NOX0:%vreg7,%vreg6
>>> >  18         %vreg8<def> = LWA 0, %vreg7;
>>> > mem:LD4[@a](tbaa=!3)(dereferenceable) G8RC:%vreg8
>>> > G8RC_and_G8RC_NOX0:%vreg7
>>> >  19         %vreg9<def> = CMPLD %vreg8, %vreg15; CRRC:%vreg9
>>> > G8RC:%vreg8,%vreg15
>>> >  20         BCC 68, %vreg9, <BB#3>; CRRC:%vreg9
>>> >  21         B <BB#2>
>>> >  22     Successors according to CFG: BB#2(0x30000000 / 0x80000000 =
>>> > 37.50%)
>>> > BB#3(0x50000000 / 0x80000000 = 62.50%)
>>> >  23
>>> >  24 BB#2: derived from LLVM BB %if.then2
>>> >  25     Predecessors according to CFG: BB#1
>>> >  26         ADJCALLSTACKDOWN 96, %R1<imp-def,dead>, %R1<imp-use>
>>> >  27         %vreg16<def> = COPY %vreg15; G8RC:%vreg16,%vreg15
>>> >  28         BL8_NOP <ga:@callVoid>, <regmask **LONG LIST**>,
>>> > %X3<imp-def,dead>
>>> >  29         ADJCALLSTACKUP 96, 0, %R1<imp-def,dead>, %R1<imp-use>
>>> >  30         ADJCALLSTACKDOWN 96, %R1<imp-def,dead>, %R1<imp-use> 31
>>> > %X3<def> = COPY %vreg16; G8RC:%vreg16
>>> >  32         BL8_NOP <ga:@callNonVoid>, <regmask **LONG LIST**>,
>>> > %X3<imp-use>, %X2<imp-use>, %R1<imp-def>, %X3<imp-def>
>>> >  33         ADJCALLSTACKUP 96, 0, %R1<imp-def,dead>, %R1<imp-use>
>>> >  34         %vreg11<def> = COPY %X3; G8RC:%vreg11 35     Successors
>>> > according to CFG: BB#3(?%)
>>> >  36
>>> >  37 BB#3: derived from LLVM BB %return
>>> >  38     Predecessors according to CFG: BB#1 BB#2 BB#4
>>> >  39         %vreg12<def> = EXTSW_32_64 %vreg11:sub_32;
>>> > G8RC:%vreg12,%vreg11
>>> >  40         %X3<def> = COPY %vreg12; G8RC:%vreg12
>>> >  41         BLR8 %LR8<imp-use>, %RM<imp-use>, %X3<imp-use>
>>> >
>>> > No matter what I do, vreg15 will get a Callee-Saved Register assigned to
>>> > it. However, this is suboptimal. So what I am trying to accomplish is to
>>> > split the live range of vreg15 into the paths without the call and the
>>> > path
>>> > with the call (BL8_NOP is a call). Then the physical register X3 can be
>>> > used in the paths BB#0 -> BB#1 -> BB#3 and BB#0 -> BB#4 -> BB#3 and it
>>> > can
>>> > be copied to a Callee-Saved Register in BB#2.
>>> >
>>>
>>> Hi Nemanja,
>>>
>>> vreg15's live range is not across call. It is weird that it is always
>>> getting CSR. Maybe because of the hint of vreg16, i.e., the
>>> contribution of hint outweigh the cost of CSR first use?
>>>
>>> Is it possible to send out the testcase? I can try if there is anyway
>>> to allocate vreg15 to CSR.
>>>
>>> Thanks,
>>> Wei.
>>>
>>> > Without such a split, vreg15 is assigned a CSR for the entire live range
>>> > and there is no way to avoid having to save/restore the CSR in the
>>> > prologue/epilogue. If one of the two paths that did not actually have
>>> > the
>>> > call turn out to be the hottest path through the function, there is a
>>> > lot
>>> > of wasted cycles in the save/restore because we weren't able to
>>> > shrink-wrap
>>> > this function due to the choice RA made.
>>> >
>>> > If anyone can offer some ideas on what I should do here, I would truly
>>> > appreciate it.
>>> >
>>> > Nemanja
>>
>>


More information about the llvm-dev mailing list