[llvm-dev] [RFC] LLVM Coroutines

Sat Jun 11 19:06:53 PDT 2016

Hi Sanjoy:

>> How will you handle (potentially variably sized) alloca instructions
>> that cannot be elided by promotion to SSA?  E.g.
>>
>>   void cor() {
>>     int a = 0;
>>     escape(&a);
>>     for (;;) yield(a++);
>>   }

Any escaped allocas go to the coroutine frame. It is an important
use case with async coroutines:

  task<void> f() {
    char buf[N];
    ... fill buff with stuff
    await async_send(buf); // <suspend-point>
    ... do more
  }

Even though an alloca is not mentioned after suspend point, it is still
need to  go to the coroutine frame, as async_send will be referencing it
while coroutine is suspended (and its stack frame gone).

CoroSplit pass is doing two interesting things.
  1) Creates the coroutine frame.
  2) Chops up the coroutine into pieces

Here is a brief description:

========================
Coroutine Frame Building
========================

Coroutine frame building algorithm has two parts:
  * Deal with allocas
  * Deal with virtual registers.

Alloca part
-----------

  Step 1: Compute the blocks that will end up in resume and destroy
  Step 2: Classify alloca into three buckets
    Start Only: Only used in start
    Resume Only: Only used in resume/destroy
    Shared: Used in both (or escaped)
  Step 3: Put all shared values in the coroutine frame, RAUW shared allocas with
    GEPs from coroutine frame
  Step 4: Move all resume only allocas to the new entry block in resume/destroy

  (Future Work)

  Step X-1:
    insert dbg.values properly
  Step X:
    Utilize lifetime information for shared allocas to decide how to pack them
    tightly in coroutine frame. At the moment I throw away lifetime intrinsics.
  Step X+1:
    Deal with variable allocas. One way to deal with them is to chain them
    to the coroutine frame using the same allocation and deallocation routines
    used for coroutine frame itself). Until step X+1, no var.size alloca can
    go into a coroutine frame. Start only or Resume only allocas don't
    cause problems.

Virtual Register part
---------------------

  Step 1: Precompute block relationship X(B1,B2), when B1 dominates B2 and there
    is a path from B1 to B2 that crosses a suspend point.
  Step 2:
    Check if a path from a use of value in a resume/destroy block to a
    definition crosses a suspend point, ie X(Bd,Bu) == true, where
      Bu is a block with the use
      Bd is a block with the definition or an entry block if d is func argument
  Step 3:
    Check if we already have value 'd' in the coroutine frame, if so, replace
    the use with GEP from the coroutine frame. Otherwise, spill 'd' into the
    coroutine frame and replace the use with GEP.
     (future)
  Step X:
    Make step 3 spilling and reloading smarter.
  Step X + 1:
     Do live-ranges, coloring, etc to make coroutine frame
     really really tiny.

Cheers,
Gor

On Sat, Jun 11, 2016 at 6:09 PM, Sanjoy Das
<sanjoy at playingwithpointers.com> wrote:
> Hi Gor,
>
> How will you handle (potentially variably sized) alloca instructions
> that cannot be elided by promotion to SSA?  E.g.
>
> void cor() {
>   int a = 0;
>   escape(&a);
>   for (;;) yield(a++);
> }
>
> -- Sanjoy
>
> On Sat, Jun 11, 2016 at 5:43 PM, Gor Nishanov via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> (Dropped llvm-dev by accident. Putting it back)
>>
>>
>> HI Eli:
>>
>>>> coro.barrier() doesn't work: if the address of the alloca doesn't escape,
>>>> alias analysis will assume the barrier can't read or write the value of
>>>> the alloca, so the barrier doesn't actually block code movement.
>>
>> Got it. I am new to this and learning a lot over the course
>> of this thread. Thank you for being patient with me.
>>
>> Two questions and one clarification:
>>
>> Q1: Do we have to have a load here?
>> ===================================
>>
>>>> block1:
>>>>    %first_time = load... <--- What are we loading here?
>>>>    br i1 %first_time, label return, label suspend1
>>>>
>>>> supend1:
>>>>    %0 = coro.suspend()
>>>>    switch %0 (resume1, destroy1)
>>
>> Can we use three way coro.suspend instead?
>>
>>   Block1:
>>     %0 = call i8 coro.suspend()
>>     switch i8 %0, label suspend1 [i8 0 %return] ; or icmp + br i1
>>   Suspend1:
>>     switch i8 %0, label %resume1 [i8 1 %destroy1] ; or icmp + br i1
>>
>> One problem I can see is that someone can write a pass that might merge
>> two branches / switches into one switch and we are back where we were.
>> I guess what you meant by load, is to call some coro.is.first.time() intrinsic.
>> So it looks like:
>>
>>>> block1:
>>>>    %first_time = call i1 coro.is.first.time()
>>>>    br i1 %first_time, label return, label suspend1
>>>>
>>>> supend1:
>>>>    %0 = coro.suspend()
>>>>    switch %0 (resume1, destroy1)
>>
>> This looks fine, there may be more uses for this intrinsic in the frontend.
>> Killing two birds with one stone. Good.
>>
>> Question 2: Why the switch in the return block?
>> ===============================================
>>
>> I would think that **pre-split** return block would be simply:
>>
>> return:
>>    <run dtors for parameters, if required>
>>    <conversion ops for ret value, if required>
>>    <ret void> or <ret whatever>
>>
>> Where and why I should put a switch that you mentioned in this return block?
>>
>> BTW, I am speaking of the return block as if it is one block,
>> but, it could be a dominating block over all the blocks that together
>> run the destructors, do return value conversion, etc.
>>
>> Clarification:
>> ==============
>>
>>>> Also, if some non-C++ language wants to generate coroutines,
>>>> it might not have to generate the return block at all.
>>
>> C++ coroutines are flexible. The semantic of a coroutine is defined via
>> traits, so you may define a coroutine that returns void. It does not have
>> to return coroutine handle or some struct that wraps the coroutine handle.
>>
>> For example, a developer may have a queue of suspended coroutines that is
>> processed by a scheduler, so a coroutine can queue its handle on suspension
>> either into a global queue or thread_local queue. If there are no destructors
>> to run for parameters, the return block would be simply:
>>
>>   return:
>>     ret void
>>
>>
>> Cheers,
>> Gor
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> --
> Sanjoy Das
> http://playingwithpointers.com