[PATCH] D66278: [RISCV] Enable tail call opt for variadic function

Thu Aug 29 04:01:50 PDT 2019

brucehoult added a comment.

In D66278#1646758 <https://reviews.llvm.org/D66278#1646758>, @lenary wrote:

> Please may you explain a bit further why calls using varargs (when not passed by the stack) are allowed to be tail-call-optimised?
>
> I feel the justification would be good documentation to go with the patch, and will help out me and other reviewers.

Because the caller's stack frame (if any) can be completely deallocated before the tail call, leaving the only state as the return address (in ra) and the arguments being passed to the tail-called function (in a0-a7).

  __attribute__((noinline))
  int print_scaled(unsigned long n, int scale){
    return printf("%lu.%lu", n/scale, n%scale);
  }

x86_64 gcc tail calls printf at -O2 or -Os. So does RISC-V gcc. Here -Os:

00000000000101b0 <print_scaled>:

  101b0:       02b57633                remu    a2,a0,a1
  101b4:       02b555b3                divu    a1,a0,a1
  101b8:       0001a537                lui     a0,0x1a
  101bc:       93050513                addi    a0,a0,-1744 # 19930

<__clzdi2+0x36>

  101c0:       19c0006f                j       1035c <printf>

Changing the function to...

  int print_scaled(unsigned long n, int scale){
    return printf("%lu.%lu %lu.%lu %lu.%lu %lu.%lu ",
                  n/scale, n%scale, n/scale, n%scale, n/scale, n%scale, n/scale, n%scale);
  }

... prevents tail calling on RISC-V because with nine arguments the last n%scale goes on the stack. On x86_64 the last four arguments are pushed.

Eliminating one pair enables tail calling on RISC-V, but x86_64 still spills to the stack. The x86 tail-calls with two copies (five arguments total), which is its maximum in registers.

It gets trickier if the calling function creates a stack frame, for example because it calls some other function(s) as well, or simply has too many live local variables. In this case the arguments for the printf need to be set up, then ra and any s registers reloaded and the stack popped, before the tail call.

  __attribute__((noinline))
  int power10(int n){
    return n == 0 ? 1 : 10 * power10(n - 1);
  }

  __attribute__((noinline))
  int print_scaled(unsigned long n, int digits){
    int scale = power10(digits);
    return printf("%lu.%lu", n/scale, n%scale);
  }

00000000000101c0 <print_scaled>:

  101c0:       1141                    addi    sp,sp,-16
  101c2:       e022                    sd      s0,0(sp)
  101c4:       842a                    mv      s0,a0
  101c6:       852e                    mv      a0,a1
  101c8:       e406                    sd      ra,8(sp)
  101ca:       fe5ff0ef                jal     ra,101ae <power10>
  101ce:       02a47633                remu    a2,s0,a0
  101d2:       60a2                    ld      ra,8(sp)
  101d4:       02a455b3                divu    a1,s0,a0
  101d8:       6402                    ld      s0,0(sp)
  101da:       0001a537                lui     a0,0x1a
  101de:       95050513                addi    a0,a0,-1712 # 19950

<__clzdi2+0x32>

  101e2:       0141                    addi    sp,sp,16
  101e4:       19c0006f                j       10380 <printf>

ra was saved to the stack so power10 could be called and s0 was saved to the stack so there is somewhere to save n over the call to power10. Both ra and s0 can be restored and the stack frame deallocated before tail-calling printf. This is only possible because none of the arguments to printf needed to be stored in the stack frame -- the arguments require only a0, a1 and a2 in this case.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66278/new/

https://reviews.llvm.org/D66278