[llvm-dev] [RFC] Adding CPS call support

Tue Apr 18 10:59:21 PDT 2017

On 4/18/2017 3:47 AM, Kavon Farvardin wrote:
>> Most architectures have a call instruction which does not push anything onto the stack; e.g. on ARM, the "BL" instruction saves the return address in LR.  On other architectures, you can emulate this (for example, you could lower an IR "call" to LEA+JMP on x86-64).
> This is similar to what I was originally thinking, since the end goal of all of this is the following machine code for a call (I'll use an x86-64 example):
>
>      leaq  _retpt(%rip), %scratchReg
>      movq  %scratchReg, (%ghcSPReg)
>      jmp   _bar
>
> But, if we want to end up with the above machine code using a custom lowering of just the following IR instruction,
>
>      %retVals = cps call ghccc @bar (... args ...)
>
> _without_ explicitly taking the block address and writing it to the GHC stack pointer beforehand, there are some things to consider:
>
> 1. How do we prevent IR optimizations from eliminating the %retpt block?
>
>    If we do not explicitly take the address of the %retpt block, a pass such as simplifycfg has no reason to preserve the block, and may merge it with its only predecessor.
If the return address is implicit, it's just the instruction immediately 
after the call; it doesn't need to be explicitly represented in the CFG.

If you really do need to represent the address explicitly somehow, you 
could use something like an "invoke", which has branch destinations 
integrated into the call.
>> The return address for the current function is fundamentally live across any non-tail call.
> I'm not quite sure what you mean by this.
>
> While there may be a return address handed to the main function (and thus passed to every other function) it's either unused or not known to LLVM.
Any function which has a "ret" IR instruction must save the address to 
return to somewhere.  I guess you could generate code without any "ret" 
instructions (tail-call with call+unreachable), but that's awkward 
because you can't return to the non-CPS code which called into your code.
>> And LLVM will hoist other operations across non-tail calls, and in the process introduce values which are live across calls.  You need to save those values somewhere.  The key question is where.  Your proposal tries to address that by explicitly saving/restoring the return address onto the GHC stack, but you're working at the wrong level; you can't get a complete list of which values are live across calls until register allocation.
> The part I had omitted from the proposal is the initialization of the rest of the GHC stack frame, which is laid out in GHC by finding all of the values that are live across the call before CPS conversion.
>
> While the full list of values preserved by LLVM are not known until register allocation, there are some properties of the LLVM IR we will initially generate that are important to consider:
>
> 1. There are no IR values live across the non-tail 'cps' IR call, since they are all passed to the callee.
> 2. All IR values used after the 'cps' IR call come from the struct returned by that call.
>
> Thus, it should not be possible to hoist instructions above this particular non-tail call, as they are all derived from the values returned in the struct. Any register spills allocated to the LLVM stack before the non-tail call are dead, as they have been copied to the GHC stack frame if they were live.
>
> If you have a particular example in mind that would be great.

If your code uses a large integer constant (like 0x123456789) multiple 
times in the same function, LLVM will load it into a register and reuse 
it across calls.  Same for floating-point constants.

If you reference a global multiple times, LLVM will load the address of 
that global into a register, and reuse it.  Same for the address of a 
function (if you're not directly calling it).  And if LLVM can reason 
about the value of a global (e.g. it's a constant), it could save that 
value into a register.

If LLVM can prove through interprocedural optimization that a function 
always returns the value of an argument, it will replace uses of the 
return value of the call with uses of the argument.

And even if you could force LLVM not to keep any values live across your 
calls, there would be no point: LLVM optimizations wouldn't be able to 
tell which values are equivalent before and after the call. The end 
result is exactly equivalent to generating multiple functions.

-----

A potential alternative to what you're proposing here is to perform the 
transform which splits a function into multiple functions on LLVM IR, as 
opposed to running it before generating IR.  You might want to look at 
the implementation of C++ coroutines 
(http://llvm.org/docs/Coroutines.html) for inspiration.

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project