[llvm-dev] [RFC] Adding CPS call support

Wed Apr 19 10:38:28 PDT 2017

> The semantics around inlining alone are problematic enough to warrant serious hesitation.

There are nicer ways to embed CPS call/return into LLVM; I just figured that there would not be much support for adding a new terminator because it would change a lot of code. Ideally we would have a block terminator like:

    cps call ghccc @bar (.. args ..) returnsto label %retpt

Where the "returnsto" is optional to cover both CPS calls and returns. Another alternative would be to emulate this terminator using an intrinsic.

So, my question is: would there be more support for a version of this terminator (whether as an intrinsic or not) instead of what I was proposing earlier?

~kavon

> On Apr 18, 2017, at 7:24 PM, Philip Reames via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> 
> 
> On 04/17/2017 08:30 AM, Kavon Farvardin via llvm-dev wrote:
>> Summary
>> =======
>> 
>> There is a need for dedicated continuation-passing style (CPS) calls in LLVM to
>> support functional languages. Herein I describe the problem and propose a
>> solution. Feedback and/or tips are greatly appreciated, as our goal is to
>> implement these changes so they can be merged into LLVM trunk.
>> 
>> 
>> Problem
>> =======
>> 
>> Implementations of functional languages like Haskell and ML (e.g., GHC and
>> Manticore) use a continuation-passing style (CPS) transformation in order to
>> manage the call stack explicitly. This is done prior to generating LLVM IR, so
>> the implicit call stack within LLVM is not used for call and return.
>> 
>> When making a non-tail call while in CPS, we initialize a stack frame for the
>> return through our own stack pointer, and then pass that stack pointer to the
>> callee when we jump to it. It is here when we run into a problem in LLVM.
>> 
>> Consider the following CPS call to @bar and how it will return:
>> 
>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> 
>> define void @foo (i8** %sp, ...) {
>> someBlk:
>>     ; ...
>>     ; finish stack frame by writing return address
>>   %retAddr = blockaddress(@foo, %retpt)
>>   store i8* %retAddr, i8** %sp
>>     ; jump to @bar
>>   tail call void @bar(i8** %sp, ...)
>> 
>>  retpt: 				; <- how can @bar "call" %retpt?
>>    %sp2 = ???
>>    %val = ???
>>    ; ...
>> 
>>  }
>> 
>>  define void @bar (i8** %sp, ...) {
>>  	  ; perform a return
>>  	%retAddr0 = load i8*, i8** %sp
>>  	%retAddr1 = bitcast i8* %retAddr0 to void (i8**, i64)*
>>  	%val = bitcast i64 1 to i64
>>  	  ; jump back to %retpt in @foo, passing %sp and %val
>>  	tail call void %retAddr1(i8** %sp, i64 %val)
>>  }
>> 
>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> 
>> There is currently no way to jump back to %retpt from another function, as block
>> addresses have restricted usage in LLVM [1]. Our main difficulty is that we
>> cannot jump to a block address without knowing its calling convention, i.e., the
>> particular machine registers (or memory locations) that the block expects
>> incoming values to be passed in.
>> 
>> The workaround we have been using in GHC for LLVM is to break apart every
>> function, placing the code for the continuation of each call into a new
>> function. We do this only so that we can store a function pointer instead of a
>> block address to our stack. This particularly gross transformation inhibits
>> optimizations in both GHC and LLVM, and we would like to remove the need for it.
> Can you give a bit more information on what this transformation breaks optimization wise?  You're essentially explicitly representing the continuation and converting all calls into tail calls with an explicit continuation argument. I would expect the LLVM optimizer to do reasonable well with such a representation.
>> 
>> 
>> Proposal
>> ========
>> 
>> I believe the lowest-impact method of fixing this problem with LLVM is the
>> following:
>> 
>> First, we add a special 'cps' call instruction marker to be used on non-tail
>> calls. Then, we use a specialized calling convention for these non-tail calls,
>> which fix the returned values to specific locations in the machine code [2].
>> 
>> To help illustrate what's going on, let's rewrite the above example using the
>> proposed 'cps' call:
>> 
>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> 
>> define { ... } @foo (i8** %sp, ...) {
>> someBlk:
>>     ; ...
>>     ; finish stack frame by writing return address
>>   %retAddr = blockaddress(@foo, %retpt)
>>   store i8* %retAddr, i8** %sp
>>     ; jump to @bar
>>   %retVals = cps call ghccc {i8**, i64} @bar (i8** %sp, ...)
>>   br label %retpt
>> 
>>  retpt:
>>    %sp2 = extractvalue {i8**, i64} %retVals, 0
>>    %val = extractvalue {i8**, i64} %retVals, 1
>>    ; ...
>> 
>>  }
>> 
>>  define {i8**, i64} @bar (i8** %sp, ...) {
>>  	  ; perform a return
>>  	%retAddr0 = load i8*, i8** %sp
>>  	%retAddr1 = bitcast i8* %retAddr0 to {i8**, i64} (i8**, i64)*
>>  	%val = bitcast i64 1 to i64
>>  	  ; jump back to %retpt in @foo, passing %sp and %val
>>  	tail call ghccc void %retAddr1(i8** %sp, i64 %val)
>> 
>>  	unreachable   ; <- ideally this would be our terminator,
>>  	              ; but 'unreachable' breaks TCO, so we will
>>  	              ; emit a ret of the struct "returned" by the call.
>>  }
>> 
>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> 
>> The important point here is that the 'cps' marked call will lower to a jump. The
>> 'cps' call marker means that the callee knows how to return using the arguments
>> explicitly passed to it, i.e., the stack pointer %sp. The callee cannot use a
>> 'ret' instruction if it is 'cps' called.
>> 
>> Either before or during 'cps' call lowering, any instructions following the
>> 'cps' call to @bar are sunk into the the block %retpt, and the unconditional
>> branch to %retpt is deleted/ignored. We include that branch to preserve
>> control-flow information for LLVM IR optimization passes.
>> 
>> The 'extractvalue' instructions are what ensure the calling convention of
>> %retpt, since the fields of the struct %retVals are returned in physical
>> registers dictated by the (modified) ghccc convention. Those same physical
>> registers are used by the ghccc tail call in @bar when it jumps back to %retpt.
>> So, the call & return convention of ghccc ensures that everything matches up.
>> 
>> 
>> Interaction with LLVM
>> =====================
>> 
>> (1) Caller-saved Values
>> 
>> One may wonder how this would work if there are caller-saved values of the 'cps'
>> call. But, in our first example, which closely matches what CPS code looks like,
>> the call to @bar was in tail position. Thus, in the second example, there are no
>> caller-saved values for the 'cps' call to @bar, as all live values were passed
>> as arguments in the call.
>> 
>> This caller-saved part is a bit subtle. It works fine in my experience [2] when
>> @bar is a function not visible to LLVM. My impression is that even if @bar is
>> visible to LLVM, there is still no issue, but if you can think of any corner
>> cases that would be great!
>> 
>> (2) Inlining
>> 
>> My gut feeling is that we cannot inline a 'cps' marked call-site without more
>> effort. This is because we might end up with something odd like this once the
>> dust settles:
>> 
>>     %retAddr = blockaddress(@foo, %retpt)
>>     %retAddr1 = bitcast i8* %retAddr to {i8**, i64} (i8**, i64)*
>>     tail call ghccc %retAddr1 ( %sp, ... )
>> 
>> We could add a pass that turns the above sequence into just an unconditional
>> branch to %retpt, using a phi-node to replace each 'extractvalue' instruction in
>> that block.
>> 
>> I'm not sure whether inlining in LLVM is important for us yet, as we tend to
>> inline quite a lot before generating LLVM IR. I don't think this additional fix-
>> up pass would be too difficult to implement if it's desired.
>> 
>> 
>> Implementation Sketch and Conclusion
>> ====================================
>> 
>> My current plan is to add this special lowering of 'cps' calls during the
>> translation from LLVM IR to the SelectionDAG. I welcome any suggestions or tips
>> on the best way to approach this. An important goal for us is to merge this into
>> trunk since we do not want to bundle a special version of LLVM with GHC.
>> 
>> Please let me know soon if you have any objections to this feature.
>> 
>> Thanks for reading,
>> Kavon
> Honestly, this proposed design seems seriously problematic.  The semantics around inlining alone are problematic enough to warrant serious hesitation.  I think you need to take a step back, describe the original problem you're trying to solve and why the standard conversion approaches don't work out.  You'd need to strongly motivate a change of this magnitude and I really don't feel like you've done so so far.
>> 
>> 
>> References
>> ==========
>> 
>> [1] http://llvm.org/docs/LangRef.html#blockaddress
>> [2] http://kavon.farvard.in/papers/ml16-cwc-llvm.pdf
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev