[PATCH] D87430: [ARM] Add heuristic to avoid lowering calls to blx for Thumb1 in ARMTargetLowering::LowerCall
Prathamesh via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Sep 11 05:03:19 PDT 2020
prathamesh added inline comments.
================
Comment at: llvm/lib/Target/ARM/ARMISelLowering.cpp:2261
+ // Hardcoded for now.
+ unsigned nRegs = 7;
+
----------------
efriedma wrote:
> dmgreen wrote:
> > Hmm. Not sure. Perhaps this can use something like getRegClassFor(MVT::i32)->getNumRegs()? It's still probably a very rough estimate of allocatable registers.
> I don't understand the intent here.
>
> In Thumb1 mode, there are four callee-save registers that are considered allocatable: r4-r7. We must use one of them to store the address of an indirect call. (We could potentially use high registers, but that isn't implemented.) None of them are ever used to pass arguments. Given that, why does the number of arguments to the function matter? Why does the number of caller-save registers matter?
IIUC, r0-r3 are caller saved, and before making any calls, they need to be copied into remaining registers or saved to memory.
For example:
```
define void @f(i32 %x, i32 %y, i32 %z, i32 %w) optsize minsize {
entry:
call void @g(i32 %x, i32 %y)
call void @g(i32 %x, i32 %y)
call void @g(i32 %x, i32 %y)
call void @h(i32 %z, i32 %w)
ret void
}
declare void @g(i32, i32)
declare void @h(i32, i32)
```
code-gen:
```
push {r3, r4, r5, r6, r7, lr}
str r3, [sp] @ 4-byte Spill
mov r5, r2
mov r6, r1
mov r7, r0
ldr r4, .LCPI0_0
blx r4
mov r0, r7
mov r1, r6
blx r4
mov r0, r7
mov r1, r6
blx r4
mov r0, r5
ldr r1, [sp] @ 4-byte Reload
bl h
pop {r3, r4, r5, r6, r7, pc}
```
In this case, it copies r2, r1, r0 into r5, r6, r7 respectively and uses r4 for function's address.
Since there is no register left to copy r3, it is spilled into memory.
However, I think I wrongly assumed that it could use one of r0-r3 (if function had less than 4 params) for holding function's address if r4 -r7 were not available.
So the condition below should probably be:
PreferIndirect = MF.getFunction().arg_size() + Outs.size() < 4 ?
(altho that also makes it more restrictive).
Btw, compiling with lowering to indirect call and without, result in same sized binaries for above test-case.
I wonder, if we want to disable the indirect call heuristic only if the register holding function's address gets spilled since it's repeatedly rematerialized before each call (similar to the original test-case) ? In which case, the approach in D79785 seems to be the only correct one.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D87430/new/
https://reviews.llvm.org/D87430
More information about the llvm-commits
mailing list