[LLVMdev] [RFC] [PATCH] add tail call optimization to thumb1-only targets
John Brawn
john.brawn at arm.com
Mon Jan 12 06:50:33 PST 2015
Some comments on the general approach:
> For Tail calls identified during DAG generation, the target address
> will
> be loaded into a register
> by use of the constant pool. If R3 is used for argument passing, the
> target address is forced
> to hard reg R12 in order to overcome limitations thumb1 register
> allocator with respect to
> the upper registers.
For the case when r3 is not used for argument passing doing this is
definitely a good idea, but if we have to BX via r12 then we have
to first do an LDR to some other register as there's no instruction
to LDR directly to r12. With current trunk LLVM and your patch for
int calledfn(int,int,int,int);
int test() {
return calledfn(1,2,3,4);
}
I get
test:
push {r4, lr}
movs r0, #1
movs r1, #2
movs r2, #3
movs r3, #4
ldr r4, .LCPI0_0
mov r12, r4
ldr r4, [sp, #4]
mov lr, r4
pop {r4}
add sp, #4
bx r12
r4 has been used to load the target address, which means we have to save
and restore r4 which means we're no better off than not tailcalling. And
you can also see the next problem:
> During epilog generation, spill register restoring will be done within
> the emit epilogue function.
> If LR happens to be spilled on the stack by the prologue, it's restored
> by use of a scratch register
> just before restoring the other registers.
POP is 1+N cycles whereas LDR is 2 cycles. If we need to LDR lr from the
stack then POP r4 then that's 2 (LDR) + 1+1 (POP) + 1 (MOV to lr) + 1
(ADD sp) = 6 cycles, but a POP {r4,lr} is just 3 cycles.
So I think tailcalling is only worthwhile if the function does not save
lr and r3 is free to hold the target address. Also needing consideration
is what happens if callee-saved registers other than lr need to be saved,
but I haven't looked into this.
A few comments on the patch itself (I've only given it a quick look over):
+ bool IsLRPartOfCalleeSavedInfo = false;
+
+ for (const CalleeSavedInfo &CSI : MFI->getCalleeSavedInfo()) {
if (CSI.getReg() == ARM::LR)
- IsV4PopReturn = true;
+ IsLRPartOfCalleeSavedInfo = true;
+ }
+
+ bool IsV4PopReturn = IsLRPartOfCalleeSavedInfo;
IsV4PopReturn &= STI.hasV4TOps() && !STI.hasV5TOps();
+ if (IsTailCallReturn) {
+ MBBI = MBB.getLastNonDebugInstr();
+
+ // First restore callee saved registers. Unlike for normal returns
+ // this is *not* done in restoreCalleeSavedRegisters.
+ const std::vector<CalleeSavedInfo> &CSI(MFI->getCalleeSavedInfo());
+
+ bool IsR4IncludedInCSI = false;
+ bool IsLRIncludedInCSI = false;
+ for (unsigned i = CSI.size(); i != 0; --i) {
+ unsigned Reg = CSI[i-1].getReg();
+ if (Reg == ARM::R4)
+ IsR4IncludedInCSI = true;
+ if (Reg == ARM::LR)
+ IsLRIncludedInCSI = true;
+ }
You set IsLRPartOfCalleeSavedInfo then right afterwards duplicate the work
in setting IsLRIncludedInCSI.
+ // ARMv4T and tail call returns require BX, see emitEpilogue
+ if ((STI.hasV4TOps() && !STI.hasV5TOps()))
The comment says 'tail call returns require BX', but the code doesn't do that.
John
More information about the llvm-dev
mailing list