<div dir="ltr">In your example:<div><br></div><div>__attribute__((noinline))<br>int print_scaled(unsigned long n, int scale){<br> return printf("%lu.%lu", n/scale, n%scale);<br>} </div><div><br></div><div>This call printf with the varargs can be optimized as tail call by RISC-V gcc.</div><div>Since it doesn't have any parameters passed via the stack from print_scaled to printf.<br></div><div>Don't need to create stack frame for passing arguments. So it can just tail call printf and</div><div>doesn't need to return back print_scaled for freeing the stack.</div><div><br></div><div>If a called function with varargs has more than eight arguments, it is not allowed to do </div><div>tail call opt. Because some of arguments are passed by the stack.</div><div><br></div><div>It can focus on whether the stack frame is created and not yet freed before the function call</div><div>for saving saved register or passing parameters or others.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Bruce Hoult <<a href="mailto:brucehoult@sifive.com">brucehoult@sifive.com</a>> 於 2019年8月29日 週四 下午2:23寫道:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">To be concrete, you're talking about whether the function being called<br>
is varargs, not the function doing the calling? For example:<br>
<br>
__attribute__((noinline))<br>
int print_scaled(unsigned long n, int scale){<br>
return printf("%lu.%lu", n/scale, n%scale);<br>
}<br>
<br>
x86_64 gcc tail calls printf at -O2 or -Os. So does RISC-V gcc. Here -Os:<br>
<br>
00000000000101b0 <print_scaled>:<br>
101b0: 02b57633 remu a2,a0,a1<br>
101b4: 02b555b3 divu a1,a0,a1<br>
101b8: 0001a537 lui a0,0x1a<br>
101bc: 93050513 addi a0,a0,-1744 # 19930<br>
<__clzdi2+0x36><br>
101c0: 19c0006f j 1035c <printf><br>
<br>
Changing the function to...<br>
<br>
int print_scaled(unsigned long n, int scale){<br>
return printf("%lu.%lu %lu.%lu %lu.%lu %lu.%lu ",<br>
n/scale, n%scale, n/scale, n%scale, n/scale, n%scale,<br>
n/scale, n%scale);<br>
}<br>
<br>
... prevents tail calling on RISC-V because with nine arguments the<br>
last n%scale goes on the stack. On x86_64 the last four arguments are<br>
pushed.<br>
<br>
Eliminating one pair enables tail calling on RISC-V, but x86_64 still<br>
spills to the stack. The x86 tail-calls with two copies (five<br>
arguments total), which is its maximum in registers.<br>
<br>
It gets trickier if the calling function creates a stack frame, for<br>
example because it calls some other function(s) as well, or simply has<br>
too many live local variables. In this case the arguments for the<br>
printf need to be set up, then ra and any s registers reloaded and the<br>
stack popped, before the tail call.<br>
<br>
__attribute__((noinline))<br>
int power10(int n){<br>
return n == 0 ? 1 : 10 * power10(n - 1);<br>
}<br>
<br>
__attribute__((noinline))<br>
int print_scaled(unsigned long n, int digits){<br>
int scale = power10(digits);<br>
return printf("%lu.%lu", n/scale, n%scale);<br>
}<br>
<br>
00000000000101c0 <print_scaled>:<br>
101c0: 1141 addi sp,sp,-16<br>
101c2: e022 sd s0,0(sp)<br>
101c4: 842a mv s0,a0<br>
101c6: 852e mv a0,a1<br>
101c8: e406 sd ra,8(sp)<br>
101ca: fe5ff0ef jal ra,101ae <power10><br>
101ce: 02a47633 remu a2,s0,a0<br>
101d2: 60a2 ld ra,8(sp)<br>
101d4: 02a455b3 divu a1,s0,a0<br>
101d8: 6402 ld s0,0(sp)<br>
101da: 0001a537 lui a0,0x1a<br>
101de: 95050513 addi a0,a0,-1712 # 19950<br>
<__clzdi2+0x32><br>
101e2: 0141 addi sp,sp,16<br>
101e4: 19c0006f j 10380 <printf><br>
<br>
On Wed, Aug 28, 2019 at 8:31 PM Jim Lin via Phabricator<br>
<<a href="mailto:reviews@reviews.llvm.org" target="_blank">reviews@reviews.llvm.org</a>> wrote:<br>
><br>
> Jim added a comment.<br>
><br>
> @lenary<br>
> If any arguments are passed by the stack, it is not allowed to do tail-call-opt.<br>
> Because the caller would allocate the stack for passing the arguments, and need to<br>
> free the stack after the call finished (the call must return back for free the stack).<br>
><br>
> The only difference on passing the varargs is that 2xXLen argument need to<br>
> be assigned an 'even' or 'aligned' register (8-byte alignment for RV32 or 16-byte alignment for RV64).<br>
> So the function with varargs is allowed to be tail-call-optimised if no arguments are passed via the stack.<br>
><br>
><br>
> Repository:<br>
> rL LLVM<br>
><br>
> CHANGES SINCE LAST ACTION<br>
> <a href="https://reviews.llvm.org/D66278/new/" rel="noreferrer" target="_blank">https://reviews.llvm.org/D66278/new/</a><br>
><br>
> <a href="https://reviews.llvm.org/D66278" rel="noreferrer" target="_blank">https://reviews.llvm.org/D66278</a><br>
><br>
><br>
><br>
</blockquote></div>