[llvm-dev] Improving the split heuristics for the Greedy Register Allocator

Nemanja Ivanovic via llvm-dev llvm-dev at lists.llvm.org
Fri Jan 13 06:28:06 PST 2017

I have an issue that I've been wrestling with for quite some time and I'm
hoping that someone with a deeper understanding of the register allocator
can help me with.

Namely, I am trying to teach RA to split a live range rather than
allocating a CSR. I've attempted a very large number of tweaks to the costs
(both existing and experimental ones that I've added). However, despite all
of that, I can't seem to get RA to split the following:

  1 BB#0: derived from LLVM BB %entry
  2     Live Ins: %X3
  3         %vreg15<def> = COPY %X3; G8RC:%vreg15
  4         %vreg4<def> = CMPLDI %vreg15, 0; CRRC:%vreg4 G8RC:%vreg15
  5         %vreg11:sub_32<def,read-undef> = LI 0; G8RC:%vreg11
  6         BCC 68, %vreg4, <BB#1>; CRRC:%vreg4
  7     Successors according to CFG: BB#4(0x30000000 / 0x80000000 = 37.50%)
BB#1(0x50000000 / 0x80000000 = 62.50%)
  9 BB#4:
 10     Predecessors according to CFG: BB#0
 11         B <BB#3>
 12     Successors according to CFG: BB#3(?%)
 14 BB#1: derived from LLVM BB %if.end
 15     Predecessors according to CFG: BB#0
 16         %vreg6<def> = ADDIStocHA %X2, <ga:@a>; G8RC_and_G8RC_NOX0:%vreg6
 17         %vreg7<def> = LDtocL <ga:@a>, %vreg6, %X2<imp-use>;
mem:LD8[GOT] G8RC_and_G8RC_NOX0:%vreg7,%vreg6
 18         %vreg8<def> = LWA 0, %vreg7;
mem:LD4[@a](tbaa=!3)(dereferenceable) G8RC:%vreg8 G8RC_and_G8RC_NOX0:%vreg7
 19         %vreg9<def> = CMPLD %vreg8, %vreg15; CRRC:%vreg9
 20         BCC 68, %vreg9, <BB#3>; CRRC:%vreg9
 21         B <BB#2>
 22     Successors according to CFG: BB#2(0x30000000 / 0x80000000 = 37.50%)
BB#3(0x50000000 / 0x80000000 = 62.50%)
 24 BB#2: derived from LLVM BB %if.then2
 25     Predecessors according to CFG: BB#1
 26         ADJCALLSTACKDOWN 96, %R1<imp-def,dead>, %R1<imp-use>
 27         %vreg16<def> = COPY %vreg15; G8RC:%vreg16,%vreg15
 28         BL8_NOP <ga:@callVoid>, <regmask **LONG LIST**>,
 29         ADJCALLSTACKUP 96, 0, %R1<imp-def,dead>, %R1<imp-use>
 30         ADJCALLSTACKDOWN 96, %R1<imp-def,dead>, %R1<imp-use> 31
%X3<def> = COPY %vreg16; G8RC:%vreg16
 32         BL8_NOP <ga:@callNonVoid>, <regmask **LONG LIST**>,
%X3<imp-use>, %X2<imp-use>, %R1<imp-def>, %X3<imp-def>
 33         ADJCALLSTACKUP 96, 0, %R1<imp-def,dead>, %R1<imp-use>
 34         %vreg11<def> = COPY %X3; G8RC:%vreg11 35     Successors
according to CFG: BB#3(?%)
 37 BB#3: derived from LLVM BB %return
 38     Predecessors according to CFG: BB#1 BB#2 BB#4
 39         %vreg12<def> = EXTSW_32_64 %vreg11:sub_32; G8RC:%vreg12,%vreg11
 40         %X3<def> = COPY %vreg12; G8RC:%vreg12
 41         BLR8 %LR8<imp-use>, %RM<imp-use>, %X3<imp-use>

No matter what I do, vreg15 will get a Callee-Saved Register assigned to
it. However, this is suboptimal. So what I am trying to accomplish is to
split the live range of vreg15 into the paths without the call and the path
with the call (BL8_NOP is a call). Then the physical register X3 can be
used in the paths BB#0 -> BB#1 -> BB#3 and BB#0 -> BB#4 -> BB#3 and it can
be copied to a Callee-Saved Register in BB#2.

Without such a split, vreg15 is assigned a CSR for the entire live range
and there is no way to avoid having to save/restore the CSR in the
prologue/epilogue. If one of the two paths that did not actually have the
call turn out to be the hottest path through the function, there is a lot
of wasted cycles in the save/restore because we weren't able to shrink-wrap
this function due to the choice RA made.

If anyone can offer some ideas on what I should do here, I would truly
appreciate it.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170113/5325ff20/attachment.html>

More information about the llvm-dev mailing list