<div dir="ltr">I don't think you need a terminator at the IR level, you need one at the MI level. You can lower a CPS-call to some call pseudo-instruction during ISel and then break the MBB in two at some later point.</div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Apr 19, 2017 at 10:38 AM, Kavon Farvardin <span dir="ltr"><<a href="mailto:kavon@farvard.in" target="_blank">kavon@farvard.in</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">> The semantics around inlining alone are problematic enough to warrant serious hesitation.<br>

<br>

</span>There are nicer ways to embed CPS call/return into LLVM; I just figured that there would not be much support for adding a new terminator because it would change a lot of code. Ideally we would have a block terminator like:<br>

<br>

    cps call ghccc @bar (.. args ..) returnsto label %retpt<br>

<br>

Where the "returnsto" is optional to cover both CPS calls and returns. Another alternative would be to emulate this terminator using an intrinsic.<br>

<br>

So, my question is: would there be more support for a version of this terminator (whether as an intrinsic or not) instead of what I was proposing earlier?<br>

<span class="HOEnZb"><font color="#888888"><br>

~kavon<br>

</font></span><div class="HOEnZb"><div class="h5"><br>

> On Apr 18, 2017, at 7:24 PM, Philip Reames via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br>

><br>

><br>

><br>

> On 04/17/2017 08:30 AM, Kavon Farvardin via llvm-dev wrote:<br>

>> Summary<br>

>> =======<br>

>><br>

>> There is a need for dedicated continuation-passing style (CPS) calls in LLVM to<br>

>> support functional languages. Herein I describe the problem and propose a<br>

>> solution. Feedback and/or tips are greatly appreciated, as our goal is to<br>

>> implement these changes so they can be merged into LLVM trunk.<br>

>><br>

>><br>

>> Problem<br>

>> =======<br>

>><br>

>> Implementations of functional languages like Haskell and ML (e.g., GHC and<br>

>> Manticore) use a continuation-passing style (CPS) transformation in order to<br>

>> manage the call stack explicitly. This is done prior to generating LLVM IR, so<br>

>> the implicit call stack within LLVM is not used for call and return.<br>

>><br>

>> When making a non-tail call while in CPS, we initialize a stack frame for the<br>

>> return through our own stack pointer, and then pass that stack pointer to the<br>

>> callee when we jump to it. It is here when we run into a problem in LLVM.<br>

>><br>

>> Consider the following CPS call to @bar and how it will return:<br>

>><br>

>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;<wbr>;;;;;;;;;;;;;;;;;;<br>

>><br>

>> define void @foo (i8** %sp, ...) {<br>

>> someBlk:<br>

>>     ; ...<br>

>>     ; finish stack frame by writing return address<br>

>>   %retAddr = blockaddress(@foo, %retpt)<br>

>>   store i8* %retAddr, i8** %sp<br>

>>     ; jump to @bar<br>

>>   tail call void @bar(i8** %sp, ...)<br>

>><br>

>>  retpt:                              ; <- how can @bar "call" %retpt?<br>

>>    %sp2 = ???<br>

>>    %val = ???<br>

>>    ; ...<br>

>><br>

>>  }<br>

>><br>

>>  define void @bar (i8** %sp, ...) {<br>

>>        ; perform a return<br>

>>      %retAddr0 = load i8*, i8** %sp<br>

>>      %retAddr1 = bitcast i8* %retAddr0 to void (i8**, i64)*<br>

>>      %val = bitcast i64 1 to i64<br>

>>        ; jump back to %retpt in @foo, passing %sp and %val<br>

>>      tail call void %retAddr1(i8** %sp, i64 %val)<br>

>>  }<br>

>><br>

>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;<wbr>;;;;;;;;;;;;;;;;;;<br>

>><br>

>> There is currently no way to jump back to %retpt from another function, as block<br>

>> addresses have restricted usage in LLVM [1]. Our main difficulty is that we<br>

>> cannot jump to a block address without knowing its calling convention, i.e., the<br>

>> particular machine registers (or memory locations) that the block expects<br>

>> incoming values to be passed in.<br>

>><br>

>> The workaround we have been using in GHC for LLVM is to break apart every<br>

>> function, placing the code for the continuation of each call into a new<br>

>> function. We do this only so that we can store a function pointer instead of a<br>

>> block address to our stack. This particularly gross transformation inhibits<br>

>> optimizations in both GHC and LLVM, and we would like to remove the need for it.<br>

> Can you give a bit more information on what this transformation breaks optimization wise?  You're essentially explicitly representing the continuation and converting all calls into tail calls with an explicit continuation argument. I would expect the LLVM optimizer to do reasonable well with such a representation.<br>

>><br>

>><br>

>> Proposal<br>

>> ========<br>

>><br>

>> I believe the lowest-impact method of fixing this problem with LLVM is the<br>

>> following:<br>

>><br>

>> First, we add a special 'cps' call instruction marker to be used on non-tail<br>

>> calls. Then, we use a specialized calling convention for these non-tail calls,<br>

>> which fix the returned values to specific locations in the machine code [2].<br>

>><br>

>> To help illustrate what's going on, let's rewrite the above example using the<br>

>> proposed 'cps' call:<br>

>><br>

>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;<wbr>;;;;;;;;;;;;;;;;;;<br>

>><br>

>> define { ... } @foo (i8** %sp, ...) {<br>

>> someBlk:<br>

>>     ; ...<br>

>>     ; finish stack frame by writing return address<br>

>>   %retAddr = blockaddress(@foo, %retpt)<br>

>>   store i8* %retAddr, i8** %sp<br>

>>     ; jump to @bar<br>

>>   %retVals = cps call ghccc {i8**, i64} @bar (i8** %sp, ...)<br>

>>   br label %retpt<br>

>><br>

>>  retpt:<br>

>>    %sp2 = extractvalue {i8**, i64} %retVals, 0<br>

>>    %val = extractvalue {i8**, i64} %retVals, 1<br>

>>    ; ...<br>

>><br>

>>  }<br>

>><br>

>>  define {i8**, i64} @bar (i8** %sp, ...) {<br>

>>        ; perform a return<br>

>>      %retAddr0 = load i8*, i8** %sp<br>

>>      %retAddr1 = bitcast i8* %retAddr0 to {i8**, i64} (i8**, i64)*<br>

>>      %val = bitcast i64 1 to i64<br>

>>        ; jump back to %retpt in @foo, passing %sp and %val<br>

>>      tail call ghccc void %retAddr1(i8** %sp, i64 %val)<br>

>><br>

>>      unreachable   ; <- ideally this would be our terminator,<br>

>>                    ; but 'unreachable' breaks TCO, so we will<br>

>>                    ; emit a ret of the struct "returned" by the call.<br>

>>  }<br>

>><br>

>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;<wbr>;;;;;;;;;;;;;;;;;;<br>

>><br>

>> The important point here is that the 'cps' marked call will lower to a jump. The<br>

>> 'cps' call marker means that the callee knows how to return using the arguments<br>

>> explicitly passed to it, i.e., the stack pointer %sp. The callee cannot use a<br>

>> 'ret' instruction if it is 'cps' called.<br>

>><br>

>> Either before or during 'cps' call lowering, any instructions following the<br>

>> 'cps' call to @bar are sunk into the the block %retpt, and the unconditional<br>

>> branch to %retpt is deleted/ignored. We include that branch to preserve<br>

>> control-flow information for LLVM IR optimization passes.<br>

>><br>

>> The 'extractvalue' instructions are what ensure the calling convention of<br>

>> %retpt, since the fields of the struct %retVals are returned in physical<br>

>> registers dictated by the (modified) ghccc convention. Those same physical<br>

>> registers are used by the ghccc tail call in @bar when it jumps back to %retpt.<br>

>> So, the call & return convention of ghccc ensures that everything matches up.<br>

>><br>

>><br>

>> Interaction with LLVM<br>

>> =====================<br>

>><br>

>> (1) Caller-saved Values<br>

>><br>

>> One may wonder how this would work if there are caller-saved values of the 'cps'<br>

>> call. But, in our first example, which closely matches what CPS code looks like,<br>

>> the call to @bar was in tail position. Thus, in the second example, there are no<br>

>> caller-saved values for the 'cps' call to @bar, as all live values were passed<br>

>> as arguments in the call.<br>

>><br>

>> This caller-saved part is a bit subtle. It works fine in my experience [2] when<br>

>> @bar is a function not visible to LLVM. My impression is that even if @bar is<br>

>> visible to LLVM, there is still no issue, but if you can think of any corner<br>

>> cases that would be great!<br>

>><br>

>> (2) Inlining<br>

>><br>

>> My gut feeling is that we cannot inline a 'cps' marked call-site without more<br>

>> effort. This is because we might end up with something odd like this once the<br>

>> dust settles:<br>

>><br>

>>     %retAddr = blockaddress(@foo, %retpt)<br>

>>     %retAddr1 = bitcast i8* %retAddr to {i8**, i64} (i8**, i64)*<br>

>>     tail call ghccc %retAddr1 ( %sp, ... )<br>

>><br>

>> We could add a pass that turns the above sequence into just an unconditional<br>

>> branch to %retpt, using a phi-node to replace each 'extractvalue' instruction in<br>

>> that block.<br>

>><br>

>> I'm not sure whether inlining in LLVM is important for us yet, as we tend to<br>

>> inline quite a lot before generating LLVM IR. I don't think this additional fix-<br>

>> up pass would be too difficult to implement if it's desired.<br>

>><br>

>><br>

>> Implementation Sketch and Conclusion<br>

>> ==============================<wbr>======<br>

>><br>

>> My current plan is to add this special lowering of 'cps' calls during the<br>

>> translation from LLVM IR to the SelectionDAG. I welcome any suggestions or tips<br>

>> on the best way to approach this. An important goal for us is to merge this into<br>

>> trunk since we do not want to bundle a special version of LLVM with GHC.<br>

>><br>

>> Please let me know soon if you have any objections to this feature.<br>

>><br>

>> Thanks for reading,<br>

>> Kavon<br>

> Honestly, this proposed design seems seriously problematic.  The semantics around inlining alone are problematic enough to warrant serious hesitation.  I think you need to take a step back, describe the original problem you're trying to solve and why the standard conversion approaches don't work out.  You'd need to strongly motivate a change of this magnitude and I really don't feel like you've done so so far.<br>

>><br>

>><br>

>> References<br>

>> ==========<br>

>><br>

>> [1] <a href="http://llvm.org/docs/LangRef.html#blockaddress" rel="noreferrer" target="_blank">http://llvm.org/docs/LangRef.<wbr>html#blockaddress</a><br>

>> [2] <a href="http://kavon.farvard.in/papers/ml16-cwc-llvm.pdf" rel="noreferrer" target="_blank">http://kavon.farvard.in/<wbr>papers/ml16-cwc-llvm.pdf</a><br>

>><br>

>><br>

>> ______________________________<wbr>_________________<br>

>> LLVM Developers mailing list<br>

>> <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>

>> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

><br>

> ______________________________<wbr>_________________<br>

> LLVM Developers mailing list<br>

> <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>

> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

<br>

</div></div></blockquote></div><br></div>