[llvm-dev] [RFC] LLVM Coroutines

Sat Jun 11 21:11:17 PDT 2016

Hi Eli:

>>   Block1:
>>     %0 = call i8 coro.suspend()
>>     switch i8 %0, label suspend1 [i8 0 %return] ; or icmp + br i1
>>   Suspend1:
>>     switch i8 %0, label %resume1 [i8 1 %destroy1] ; or icmp + br i1
>>
>> This doesn't look right: intuitively the suspend happens after the return
>> block runs.

Perhaps, but, that is not the intended semantics.

The following two things are distinct concepts:

  * Coroutine is considered suspended

  * Coroutine returns control back to its caller
    (initial caller, resumer, or destroyer)

Coroutine is considered suspended means that it is legal to use coro.done,
coro.resume, coro.destroy intrinsics (either in this thread or in a different
one). It may seem strange, but, consider this example:

  task<void> f() {
    char buf[N];
    ... fill buff with stuff
    await async_send(buf); // <suspend-point>
    ... do more
  }

IR for await expression will be (conceptually):

  %0 = coro.save()
  async_send(%buf)
  coro.suspend(%0)

coro.save is the point when coroutine is considered suspended.
This allows async_send to resume the coroutine even before async_send
returned control back to f.

To make it safe, absolutely nothing should sneak-in between coro.save
and coro.suspend beyond what frontend/user put there.

Aside:
-----
Based on what we discussed earlier, I suspect that optimizer may try
to sneak stuff in between coro.save and coro.suspend. If it is the case,
I have a solution :-), Richard Smith suggested it earlier to me.
Use a distinct function (Just like you said with coro.fork :-)).

There will be no distinct coro.save. Instead, coro.suspend will take an
optional function pointer parameter and frontend will generate a little
function for all the code that needs to go in between coro.save and
coro.suspend.

  f.await(%frame) {
    %buf = get from %frame
    async_send(%buf)
  }

  in f()

    coro.suspend([...], &f.await)

CoroEarly will do the extraction. Frontend will keep emitting
coro.save/coro.suspend pairs. After CoroEarly, coro.save's will be removed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  P A U S E
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Okay getting back to the switch in the return block.

>> The switch answers the question of where the control flow actually goes
>> after the return block runs.

Since control can reenter the coroutine at any point after coro.save
and before, I don't think there is any point where putting the switch can
help.

We already told the optimizer that we have three possible paths out of suspend:
return (first time), resume and destroy. It think it is as close to the
truth as we can get.

>> Another way to put it is that it should be possible to lower a coroutine
>> to a thread rather than performing the state machine transformation.

That is very good point! It is a great way to think about the semantics.

Unfortunately, I cannot avoid doing state machine transformation in general
case.

For the coroutines with "initial_suspend" point, i.e. it suspends before
entering user authored body, it is very easy.

  future<void> f(Params) { // <initial suspend point> at open curly
    char buf[N];
    ... fill buff with stuff
    await async_send(buf); // <suspend-point>
    ... do more
  }

Will be transformed into equivalent of:

  future<void> f(Params) {
    promise<void> prom;
    future<void> result = prom.get_future();

    std::thread([prom = std::move(prom), Params = std::move(params)] {
      char buf[N];
      ... fill buff with stuff
      BLOCK_WAIT async_send(buf); // block on event instead of suspend
      ... do more
    }).detach();

    return result;
  }

No state machine transformation whatsoever, just capture the parameters and
promise and good to go. [No need to involve llvm either :-) it is a simple
lambda like transformation that can be done easily in frontend]

What if we don't have an initial suspend point? That means that the
coroutine executes in the caller thread until it hits the suspend point.
Well, we need to build up a coroutine frame, we need to chop up the function
at coro.suspends. We need to put an index in a coro frame and a switch in an
entry block of resume and suspend (if more than one suspend point).

The only difference is what coro.suspend is replaced with:

In start: coro.suspend launches a thread and gives f.resume as a start
routine and a frame pointer as a context. So, f.resume will be executing
concurrently with f slowly finishing up and returning back to the caller.
(Btw, that is exactly the semantics we are trying to capture)

In resume, coro.suspend is replaced with a blocking call waiting on
whatever will signal that the thread can resume.

=====================
Two function approach
=====================

Even though I pushing back against it, I might prototype it and see how it
looks anyways.

I will outline in CoroEarly and during CoroSplit inline it back so that
I can build the coroutine frame properly.

But even as I am typing it, it just feels too much.
Look what we are trying to solve is this.

        %0 = coro.suspend()
        ; A thing should not be hoisted here
        %first = icmp %0, 0
        ; or here
        br %first, %return, %suspend ; <=== please don't move stuff above me

      suspend:
        %1 = icmp %0, 1
        br %1, label %resume, label %destroy

      resume:
        A Thing That Can Be Hoisted (same as in destroy)
        [...]

      destroy:
        A Thing That Can Be Hoisted (same as in resume)
        [...]

      return:
        Something Else

Or maybe

        %0 = coro.suspend()
        ; A thing should not be hoisted here
        %first = coro.is.first(%0)
        ; or here
        br %first, %return, %suspend ; <=== please don't move stuff above me

      suspend:
        br %0, label %resume, label %destroy

      resume:
        A Thing That Can Be Hoisted (same as in destroy)
        [...]

      destroy:
        A Thing That Can Be Hoisted (same as in resume)
        [...]

      return:
        Something Else

Would the optimizer really try to stick 'A thing' above br %first?

As I mentioned before, I am very new to this and probably missing a lot,
but, (apart from outlining a portion of f), is there no simple way of
preventing instructions with side effects from resume or destroy blocks
being moved above 'br %first branch'?

(and if there is no good answer, two function it is. I will stop
torturing you with this :-))

Gor