[PATCH] D87430: [ARM] Add heuristic to avoid lowering calls to blx for Thumb1 in ARMTargetLowering::LowerCall

Prathamesh via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sun Sep 13 21:02:44 PDT 2020


prathamesh added inline comments.


================
Comment at: llvm/lib/Target/ARM/ARMISelLowering.cpp:2261
+      // Hardcoded for now.
+      unsigned nRegs = 7;
+
----------------
efriedma wrote:
> prathamesh wrote:
> > efriedma wrote:
> > > dmgreen wrote:
> > > > Hmm. Not sure. Perhaps this can use something like getRegClassFor(MVT::i32)->getNumRegs()? It's still probably a very rough estimate of allocatable registers.
> > > I don't understand the intent here.
> > > 
> > > In Thumb1 mode, there are four callee-save registers that are considered allocatable: r4-r7.  We must use one of them to store the address of an indirect call.  (We could potentially use high registers, but that isn't implemented.)  None of them are ever used to pass arguments.  Given that, why does the number of arguments to the function matter? Why does the number of caller-save registers matter?
> > IIUC, r0-r3 are caller saved, and before making any calls, they need to be copied into remaining registers or saved to memory.
> > 
> > For example:
> > 
> > ```
> > define void @f(i32 %x, i32 %y, i32 %z, i32 %w) optsize minsize  {
> > entry:
> >   call void @g(i32 %x, i32 %y)
> >   call void @g(i32 %x, i32 %y)
> >   call void @g(i32 %x, i32 %y)
> >   call void @h(i32 %z, i32 %w)
> >   ret void
> > }
> > 
> > declare void @g(i32, i32)
> > declare void @h(i32, i32)
> > ```
> > 
> > code-gen:
> > 
> > ```
> >         push    {r3, r4, r5, r6, r7, lr}
> >         str        r3, [sp]                        @ 4-byte Spill
> >         mov     r5, r2
> >         mov     r6, r1
> >         mov     r7, r0
> >         ldr        r4, .LCPI0_0
> >         blx        r4
> >         mov      r0, r7
> >         mov      r1, r6
> >         blx        r4
> >         mov      r0, r7
> >         mov      r1, r6
> >         blx        r4
> >         mov      r0, r5
> >         ldr        r1, [sp]                        @ 4-byte Reload
> >         bl          h
> >         pop     {r3, r4, r5, r6, r7, pc}
> > 
> > ```
> > In this case, it copies r2, r1, r0 into r5, r6, r7 respectively and uses r4 for function's address.
> > Since there is no register left to copy r3, it is spilled into memory.
> > 
> > However, I think I wrongly assumed that it could use one of r0-r3 (if function had less than 4 params) for holding function's address if r4 -r7 were not available.
> > So the condition below should probably be: 
> > PreferIndirect = MF.getFunction().arg_size() + Outs.size() < 4 ?
> > (altho that also makes it more restrictive).
> > 
> > Btw, compiling with lowering to indirect call and without, result in same sized binaries for above test-case.
> > I wonder, if we want to disable the indirect call heuristic only if the register holding function's address gets spilled since it's repeatedly rematerialized before each call (similar to the original test-case) ? In which case, the approach in D79785 seems to be the only correct one.
> > However, I think I wrongly assumed that it could use one of r0-r3 (if function had less than 4 params) for holding function's address if r4 -r7 were not available.
> 
> Well, maybe I was a little imprecise.  In general, we can use them for holding an indirect call address. But it's useless for the purpose of this optimization because it would get clobbered by the call.
> 
> If you're trying to gauge register pressure, anything related to the number of arguments isn't going to be effective: it isn't really correlated.
> Well, maybe I was a little imprecise. In general, we can use them for holding an indirect call address. But it's useless for the purpose of this optimization because it would get clobbered by the call.
Right, r0-r3 won't be usable for holding function's address in this case since they will be call clobbered.
I incorrectly assumed they would be and checked for nRegs - 1.


> If you're trying to gauge register pressure, anything related to the number of arguments isn't going to be effective: it isn't really correlated.
Hmm, you're right. At this point, I am stumped for finding a heuristic to gauge register pressure in LowerCall that can cover all cases. Do you have any suggestions ?
Thanks!







Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87430/new/

https://reviews.llvm.org/D87430



More information about the llvm-commits mailing list