[PATCH] D66278: [RISCV] Enable tail call opt for variadic function
Bruce Hoult via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Aug 29 04:01:50 PDT 2019
brucehoult added a comment.
In D66278#1646758 <https://reviews.llvm.org/D66278#1646758>, @lenary wrote:
> Please may you explain a bit further why calls using varargs (when not passed by the stack) are allowed to be tail-call-optimised?
>
> I feel the justification would be good documentation to go with the patch, and will help out me and other reviewers.
Because the caller's stack frame (if any) can be completely deallocated before the tail call, leaving the only state as the return address (in ra) and the arguments being passed to the tail-called function (in a0-a7).
__attribute__((noinline))
int print_scaled(unsigned long n, int scale){
return printf("%lu.%lu", n/scale, n%scale);
}
x86_64 gcc tail calls printf at -O2 or -Os. So does RISC-V gcc. Here -Os:
00000000000101b0 <print_scaled>:
101b0: 02b57633 remu a2,a0,a1
101b4: 02b555b3 divu a1,a0,a1
101b8: 0001a537 lui a0,0x1a
101bc: 93050513 addi a0,a0,-1744 # 19930
<__clzdi2+0x36>
101c0: 19c0006f j 1035c <printf>
Changing the function to...
int print_scaled(unsigned long n, int scale){
return printf("%lu.%lu %lu.%lu %lu.%lu %lu.%lu ",
n/scale, n%scale, n/scale, n%scale, n/scale, n%scale, n/scale, n%scale);
}
... prevents tail calling on RISC-V because with nine arguments the last n%scale goes on the stack. On x86_64 the last four arguments are pushed.
Eliminating one pair enables tail calling on RISC-V, but x86_64 still spills to the stack. The x86 tail-calls with two copies (five arguments total), which is its maximum in registers.
It gets trickier if the calling function creates a stack frame, for example because it calls some other function(s) as well, or simply has too many live local variables. In this case the arguments for the printf need to be set up, then ra and any s registers reloaded and the stack popped, before the tail call.
__attribute__((noinline))
int power10(int n){
return n == 0 ? 1 : 10 * power10(n - 1);
}
__attribute__((noinline))
int print_scaled(unsigned long n, int digits){
int scale = power10(digits);
return printf("%lu.%lu", n/scale, n%scale);
}
00000000000101c0 <print_scaled>:
101c0: 1141 addi sp,sp,-16
101c2: e022 sd s0,0(sp)
101c4: 842a mv s0,a0
101c6: 852e mv a0,a1
101c8: e406 sd ra,8(sp)
101ca: fe5ff0ef jal ra,101ae <power10>
101ce: 02a47633 remu a2,s0,a0
101d2: 60a2 ld ra,8(sp)
101d4: 02a455b3 divu a1,s0,a0
101d8: 6402 ld s0,0(sp)
101da: 0001a537 lui a0,0x1a
101de: 95050513 addi a0,a0,-1712 # 19950
<__clzdi2+0x32>
101e2: 0141 addi sp,sp,16
101e4: 19c0006f j 10380 <printf>
ra was saved to the stack so power10 could be called and s0 was saved to the stack so there is somewhere to save n over the call to power10. Both ra and s0 can be restored and the stack frame deallocated before tail-calling printf. This is only possible because none of the arguments to printf needed to be stored in the stack frame -- the arguments require only a0, a1 and a2 in this case.
Repository:
rL LLVM
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D66278/new/
https://reviews.llvm.org/D66278
More information about the llvm-commits
mailing list