[llvm-dev] [RFC] Adding CPS call support

Philip Reames via llvm-dev llvm-dev at lists.llvm.org
Tue Apr 18 11:27:31 PDT 2017



On 04/17/2017 04:04 PM, Kavon Farvardin via llvm-dev wrote:
>> Is there a reason you can't use the algorithm from the paper "A Correspondence between Continuation Passing Style and Static Single Assignment Form" to convert your IR to LLVM's SSA IR?
>
> Yes, there are a few reasons.
>
> Undoing the CPS transformation earlier in the pipeline would mean that we are using LLVM's built-in stack. The special layout and usage of the stack in GHC is achieved through CPS, so it is baked the compiler and garbage-collected runtime system.
Can you give a bit more detail here?  LLVM does provide support for 
describing GC frame maps.

p.s. You're going to have to justify the design of the runtime a bit 
here.  Extending the IR to workaround a buggy or poorly structured 
runtime is not going to be sufficient justification.  *Why* does the 
runtime need the specific runtime stack structure used?  What 
alternatives exist and why should those be rejected?
>
> ~kavon
>
>> On Apr 17, 2017, at 8:56 PM, Manuel Jacob <me at manueljacob.de> wrote:
>>
>> Hi Kavon,
>>
>> Is there a reason you can't use the algorithm from the paper "A Correspondence between Continuation Passing Style and Static Single Assignment Form" to convert your IR to LLVM's SSA IR?
>>
>> -Manuel
>>
>> On 2017-04-17 17:30, Kavon Farvardin via llvm-dev wrote:
>>> Summary
>>> =======
>>> There is a need for dedicated continuation-passing style (CPS) calls in LLVM to
>>> support functional languages. Herein I describe the problem and propose a
>>> solution. Feedback and/or tips are greatly appreciated, as our goal is to
>>> implement these changes so they can be merged into LLVM trunk.
>>> Problem
>>> =======
>>> Implementations of functional languages like Haskell and ML (e.g., GHC and
>>> Manticore) use a continuation-passing style (CPS) transformation in order to
>>> manage the call stack explicitly. This is done prior to generating LLVM IR, so
>>> the implicit call stack within LLVM is not used for call and return.
>>> When making a non-tail call while in CPS, we initialize a stack frame for the
>>> return through our own stack pointer, and then pass that stack pointer to the
>>> callee when we jump to it. It is here when we run into a problem in LLVM.
>>> Consider the following CPS call to @bar and how it will return:
>>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>>> define void @foo (i8** %sp, ...) {
>>> someBlk:
>>>     ; ...
>>>     ; finish stack frame by writing return address
>>>   %retAddr = blockaddress(@foo, %retpt)
>>>   store i8* %retAddr, i8** %sp
>>>     ; jump to @bar
>>>   tail call void @bar(i8** %sp, ...)
>>> retpt: 				; <- how can @bar "call" %retpt?
>>>    %sp2 = ???
>>>    %val = ???
>>>    ; ...
>>> }
>>> define void @bar (i8** %sp, ...) {
>>> 	  ; perform a return
>>> 	%retAddr0 = load i8*, i8** %sp
>>> 	%retAddr1 = bitcast i8* %retAddr0 to void (i8**, i64)*
>>> 	%val = bitcast i64 1 to i64
>>> 	  ; jump back to %retpt in @foo, passing %sp and %val
>>> 	tail call void %retAddr1(i8** %sp, i64 %val)
>>> }
>>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>>> There is currently no way to jump back to %retpt from another function, as block
>>> addresses have restricted usage in LLVM [1]. Our main difficulty is that we
>>> cannot jump to a block address without knowing its calling convention, i.e., the
>>> particular machine registers (or memory locations) that the block expects
>>> incoming values to be passed in.
>>> The workaround we have been using in GHC for LLVM is to break apart every
>>> function, placing the code for the continuation of each call into a new
>>> function. We do this only so that we can store a function pointer instead of a
>>> block address to our stack. This particularly gross transformation inhibits
>>> optimizations in both GHC and LLVM, and we would like to remove the need for it.
>>> Proposal
>>> ========
>>> I believe the lowest-impact method of fixing this problem with LLVM is the
>>> following:
>>> First, we add a special 'cps' call instruction marker to be used on non-tail
>>> calls. Then, we use a specialized calling convention for these non-tail calls,
>>> which fix the returned values to specific locations in the machine code [2].
>>> To help illustrate what's going on, let's rewrite the above example using the
>>> proposed 'cps' call:
>>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>>> define { ... } @foo (i8** %sp, ...) {
>>> someBlk:
>>>     ; ...
>>>     ; finish stack frame by writing return address
>>>   %retAddr = blockaddress(@foo, %retpt)
>>>   store i8* %retAddr, i8** %sp
>>>     ; jump to @bar
>>>   %retVals = cps call ghccc {i8**, i64} @bar (i8** %sp, ...)
>>>   br label %retpt
>>> retpt:
>>>    %sp2 = extractvalue {i8**, i64} %retVals, 0
>>>    %val = extractvalue {i8**, i64} %retVals, 1
>>>    ; ...
>>> }
>>> define {i8**, i64} @bar (i8** %sp, ...) {
>>> 	  ; perform a return
>>> 	%retAddr0 = load i8*, i8** %sp
>>> 	%retAddr1 = bitcast i8* %retAddr0 to {i8**, i64} (i8**, i64)*
>>> 	%val = bitcast i64 1 to i64
>>> 	  ; jump back to %retpt in @foo, passing %sp and %val
>>> 	tail call ghccc void %retAddr1(i8** %sp, i64 %val)
>>> 	unreachable   ; <- ideally this would be our terminator,
>>> 	              ; but 'unreachable' breaks TCO, so we will
>>> 	              ; emit a ret of the struct "returned" by the call.
>>> }
>>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>>> The important point here is that the 'cps' marked call will lower to a jump. The
>>> 'cps' call marker means that the callee knows how to return using the arguments
>>> explicitly passed to it, i.e., the stack pointer %sp. The callee cannot use a
>>> 'ret' instruction if it is 'cps' called.
>>> Either before or during 'cps' call lowering, any instructions following the
>>> 'cps' call to @bar are sunk into the the block %retpt, and the unconditional
>>> branch to %retpt is deleted/ignored. We include that branch to preserve
>>> control-flow information for LLVM IR optimization passes.
>>> The 'extractvalue' instructions are what ensure the calling convention of
>>> %retpt, since the fields of the struct %retVals are returned in physical
>>> registers dictated by the (modified) ghccc convention. Those same physical
>>> registers are used by the ghccc tail call in @bar when it jumps back to %retpt.
>>> So, the call & return convention of ghccc ensures that everything matches up.
>>> Interaction with LLVM
>>> =====================
>>> (1) Caller-saved Values
>>> One may wonder how this would work if there are caller-saved values of the 'cps'
>>> call. But, in our first example, which closely matches what CPS code looks like,
>>> the call to @bar was in tail position. Thus, in the second example, there are no
>>> caller-saved values for the 'cps' call to @bar, as all live values were passed
>>> as arguments in the call.
>>> This caller-saved part is a bit subtle. It works fine in my experience [2] when
>>> @bar is a function not visible to LLVM. My impression is that even if @bar is
>>> visible to LLVM, there is still no issue, but if you can think of any corner
>>> cases that would be great!
>>> (2) Inlining
>>> My gut feeling is that we cannot inline a 'cps' marked call-site without more
>>> effort. This is because we might end up with something odd like this once the
>>> dust settles:
>>>     %retAddr = blockaddress(@foo, %retpt)
>>>     %retAddr1 = bitcast i8* %retAddr to {i8**, i64} (i8**, i64)*
>>>     tail call ghccc %retAddr1 ( %sp, ... )
>>> We could add a pass that turns the above sequence into just an unconditional
>>> branch to %retpt, using a phi-node to replace each 'extractvalue' instruction in
>>> that block.
>>> I'm not sure whether inlining in LLVM is important for us yet, as we tend to
>>> inline quite a lot before generating LLVM IR. I don't think this additional fix-
>>> up pass would be too difficult to implement if it's desired.
>>> Implementation Sketch and Conclusion
>>> ====================================
>>> My current plan is to add this special lowering of 'cps' calls during the
>>> translation from LLVM IR to the SelectionDAG. I welcome any suggestions or tips
>>> on the best way to approach this. An important goal for us is to merge this into
>>> trunk since we do not want to bundle a special version of LLVM with GHC.
>>> Please let me know soon if you have any objections to this feature.
>>> Thanks for reading,
>>> Kavon
>>> References
>>> ==========
>>> [1] http://llvm.org/docs/LangRef.html#blockaddress
>>> [2] http://kavon.farvard.in/papers/ml16-cwc-llvm.pdf
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



More information about the llvm-dev mailing list