[llvm-dev] [RFC] Adding CPS call support

Mon Apr 17 12:56:07 PDT 2017

Hi Kavon,

Is there a reason you can't use the algorithm from the paper "A 
Correspondence between Continuation Passing Style and Static Single 
Assignment Form" to convert your IR to LLVM's SSA IR?

-Manuel

On 2017-04-17 17:30, Kavon Farvardin via llvm-dev wrote:
> Summary
> =======
> 
> There is a need for dedicated continuation-passing style (CPS) calls in 
> LLVM to
> support functional languages. Herein I describe the problem and propose 
> a
> solution. Feedback and/or tips are greatly appreciated, as our goal is 
> to
> implement these changes so they can be merged into LLVM trunk.
> 
> 
> Problem
> =======
> 
> Implementations of functional languages like Haskell and ML (e.g., GHC 
> and
> Manticore) use a continuation-passing style (CPS) transformation in 
> order to
> manage the call stack explicitly. This is done prior to generating LLVM 
> IR, so
> the implicit call stack within LLVM is not used for call and return.
> 
> When making a non-tail call while in CPS, we initialize a stack frame 
> for the
> return through our own stack pointer, and then pass that stack pointer 
> to the
> callee when we jump to it. It is here when we run into a problem in 
> LLVM.
> 
> Consider the following CPS call to @bar and how it will return:
> 
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> 
> define void @foo (i8** %sp, ...) {
> someBlk:
>     ; ...
>     ; finish stack frame by writing return address
>   %retAddr = blockaddress(@foo, %retpt)
>   store i8* %retAddr, i8** %sp
>     ; jump to @bar
>   tail call void @bar(i8** %sp, ...)
> 
>  retpt: 				; <- how can @bar "call" %retpt?
>    %sp2 = ???
>    %val = ???
>    ; ...
> 
>  }
> 
>  define void @bar (i8** %sp, ...) {
>  	  ; perform a return
>  	%retAddr0 = load i8*, i8** %sp
>  	%retAddr1 = bitcast i8* %retAddr0 to void (i8**, i64)*
>  	%val = bitcast i64 1 to i64
>  	  ; jump back to %retpt in @foo, passing %sp and %val
>  	tail call void %retAddr1(i8** %sp, i64 %val)
>  }
> 
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> 
> There is currently no way to jump back to %retpt from another function, 
> as block
> addresses have restricted usage in LLVM [1]. Our main difficulty is 
> that we
> cannot jump to a block address without knowing its calling convention, 
> i.e., the
> particular machine registers (or memory locations) that the block 
> expects
> incoming values to be passed in.
> 
> The workaround we have been using in GHC for LLVM is to break apart 
> every
> function, placing the code for the continuation of each call into a new
> function. We do this only so that we can store a function pointer 
> instead of a
> block address to our stack. This particularly gross transformation 
> inhibits
> optimizations in both GHC and LLVM, and we would like to remove the 
> need for it.
> 
> 
> Proposal
> ========
> 
> I believe the lowest-impact method of fixing this problem with LLVM is 
> the
> following:
> 
> First, we add a special 'cps' call instruction marker to be used on 
> non-tail
> calls. Then, we use a specialized calling convention for these non-tail 
> calls,
> which fix the returned values to specific locations in the machine code 
> [2].
> 
> To help illustrate what's going on, let's rewrite the above example 
> using the
> proposed 'cps' call:
> 
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> 
> define { ... } @foo (i8** %sp, ...) {
> someBlk:
>     ; ...
>     ; finish stack frame by writing return address
>   %retAddr = blockaddress(@foo, %retpt)
>   store i8* %retAddr, i8** %sp
>     ; jump to @bar
>   %retVals = cps call ghccc {i8**, i64} @bar (i8** %sp, ...)
>   br label %retpt
> 
>  retpt:
>    %sp2 = extractvalue {i8**, i64} %retVals, 0
>    %val = extractvalue {i8**, i64} %retVals, 1
>    ; ...
> 
>  }
> 
>  define {i8**, i64} @bar (i8** %sp, ...) {
>  	  ; perform a return
>  	%retAddr0 = load i8*, i8** %sp
>  	%retAddr1 = bitcast i8* %retAddr0 to {i8**, i64} (i8**, i64)*
>  	%val = bitcast i64 1 to i64
>  	  ; jump back to %retpt in @foo, passing %sp and %val
>  	tail call ghccc void %retAddr1(i8** %sp, i64 %val)
> 
>  	unreachable   ; <- ideally this would be our terminator,
>  	              ; but 'unreachable' breaks TCO, so we will
>  	              ; emit a ret of the struct "returned" by the call.
>  }
> 
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> 
> The important point here is that the 'cps' marked call will lower to a 
> jump. The
> 'cps' call marker means that the callee knows how to return using the 
> arguments
> explicitly passed to it, i.e., the stack pointer %sp. The callee cannot 
> use a
> 'ret' instruction if it is 'cps' called.
> 
> Either before or during 'cps' call lowering, any instructions following 
> the
> 'cps' call to @bar are sunk into the the block %retpt, and the 
> unconditional
> branch to %retpt is deleted/ignored. We include that branch to preserve
> control-flow information for LLVM IR optimization passes.
> 
> The 'extractvalue' instructions are what ensure the calling convention 
> of
> %retpt, since the fields of the struct %retVals are returned in 
> physical
> registers dictated by the (modified) ghccc convention. Those same 
> physical
> registers are used by the ghccc tail call in @bar when it jumps back to 
> %retpt.
> So, the call & return convention of ghccc ensures that everything 
> matches up.
> 
> 
> Interaction with LLVM
> =====================
> 
> (1) Caller-saved Values
> 
> One may wonder how this would work if there are caller-saved values of 
> the 'cps'
> call. But, in our first example, which closely matches what CPS code 
> looks like,
> the call to @bar was in tail position. Thus, in the second example, 
> there are no
> caller-saved values for the 'cps' call to @bar, as all live values were 
> passed
> as arguments in the call.
> 
> This caller-saved part is a bit subtle. It works fine in my experience 
> [2] when
> @bar is a function not visible to LLVM. My impression is that even if 
> @bar is
> visible to LLVM, there is still no issue, but if you can think of any 
> corner
> cases that would be great!
> 
> (2) Inlining
> 
> My gut feeling is that we cannot inline a 'cps' marked call-site 
> without more
> effort. This is because we might end up with something odd like this 
> once the
> dust settles:
> 
>     %retAddr = blockaddress(@foo, %retpt)
>     %retAddr1 = bitcast i8* %retAddr to {i8**, i64} (i8**, i64)*
>     tail call ghccc %retAddr1 ( %sp, ... )
> 
> We could add a pass that turns the above sequence into just an 
> unconditional
> branch to %retpt, using a phi-node to replace each 'extractvalue' 
> instruction in
> that block.
> 
> I'm not sure whether inlining in LLVM is important for us yet, as we 
> tend to
> inline quite a lot before generating LLVM IR. I don't think this 
> additional fix-
> up pass would be too difficult to implement if it's desired.
> 
> 
> Implementation Sketch and Conclusion
> ====================================
> 
> My current plan is to add this special lowering of 'cps' calls during 
> the
> translation from LLVM IR to the SelectionDAG. I welcome any suggestions 
> or tips
> on the best way to approach this. An important goal for us is to merge 
> this into
> trunk since we do not want to bundle a special version of LLVM with 
> GHC.
> 
> Please let me know soon if you have any objections to this feature.
> 
> Thanks for reading,
> Kavon
> 
> 
> References
> ==========
> 
> [1] http://llvm.org/docs/LangRef.html#blockaddress
> [2] http://kavon.farvard.in/papers/ml16-cwc-llvm.pdf
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev