[llvm-dev] [RFC] LLVM Coroutines

Tue Jun 14 13:25:03 PDT 2016

Hi Gor,

On Sun, Jun 12, 2016 at 6:40 PM, Gor Nishanov <gornishanov at gmail.com> wrote:
> Quick answer: folding %val to 10 is OK, but, I would prefer it to be 'undef'
> or even catch it in the verifier as it is a programmer/frontend error.

Ok.

>>> The thought experiment relevant here is that **could** you implement
>>> your semantics with reasonable bodies for the llvm.coro intrinsics?
>
> I can emulate **control flow** very closely with user mode thread switching
> APIs like make_context, jump_context. Memory behavior not so much.
>
> Library emulation
> =================
>
> Let's build a fiber abstraction out of make_context, jump_context.
>
> i1 fiber_fork(i8* %mem) - clones this function into a fiber
>                           returns 0 - in a fiber
>                           returns 1 - in a thread
>                           uses %mem to store fiber context + fiber stack

I'm not familiar with fiber-type APIs, but I assume fiber_fork is like
setjmp, in that it can "return twice"?  If so, I'd urge you to try to
not rely on that kind of behavior.  LLVM has unresolved issues around
modeling setjmp like behavior, and if possible, it is preferable to
make all of the control flow explicit from the very beginning.

> i1 fiber_suspend()      - suspends current fiber, switches back to thread
>                           stores the fiber context to be able to resume later

This looks like a normal function call, except that the "return" may
be arbitrarily spaced out from the call?

> void fiber_join(i8* %mem i1 %val) - switches to fiber stack identified by %mem,
>                           fiber_suspend will return %val when resumed
>
> void fiber_end() noreturn
>                         - switches back to a thread, does not remember fiber
>                           context and does not touch %mem.
>                           (Thus you can use this after %mem is freed)
>
>
> Detail: fiber_fork(i8* %mem)
> ----------------------------
>   * Prepares fcontext in %mem utilizing the rest of the memory for stack.
>   * Saves current thread context on a current thread stack.
>     (We should not store if in %mem, otherwise we won't be
>      able to switch back after fiber memory is freed)

As I said above, these following two points can be problematic:

>   * Sets up **current thread** context, so that when resumed it will proceed
>     as if fiber_fork returned true
>   * Sets up **fiber** context so that when resumed, the fiber will proceed
>     as if fiber_fork returned false.

>
> Okay, let's map it to our intrinsics:
>
> coro.fork == fiber_fork
> coro.suspend == fiber_suspend
> coro.end(true) == fiber_end
> coro.resume = fiber_join(%mem, i1 false)
> coro.destroy = fiber_join(%mem, i1 true)
> coro.size == 1,000,000 (or some other big number)
>
> Now let's walk through this example in the fiber model:
>
>    T coro() {
>        %t = alloca
>        %mem = malloc(...);
>        (*%t) = 10
>        if (coro.fork()) { // switches to fiber at label Start
>    ReturnBB:
>          %val = *%t // will be 10, as fiber has its own copy
>          return some_code_creating_T(%val);
>        }
>    Start:
>        Body Of The Coroutine
>        With Suspends and All The Fun
>    DeleteBB:
>        (*%t) = 15 ; // fine, copy of %t on a fiber stack updated
>        free(%mem)
>        (*%t) = 20 ; // undef: as the fiber memory is freed already

Say there was no (*%t) = 20. There is nothing so far that prevents
sinking "(*%t) = 15" from before the free (where it is legal) to after
the free (where it is not).  %t is an unescaped alloca, and LLVM can
move around loads from and stores to these (as a first approximation)
as it pleases.

>        coro.end(true) // switches back to the thread at ReturnBB
>        UNREACHABLE // <== never reaches here. So we are honest with optimizer
>      }
>
>
> Back to the LLVM based implementation
> =====================================
>
> In your example, there will be a path from the block where %t is used to
> where %t is defined that crosses a suspend point. Therefore %t will be part of
> the coroutine frame. So before the split, the program will be transformed into:
>
>    coro.frame = type { int %t }
>
>    T coro() {
>        %t = alloca
>        %mem = malloc(...);
>        %s = (coro.frame*)coro.init(%mem)
>        s->%t = 10
>        if (coro.fork()) {
>    ReturnBB:
>          %val = 10 // const.prop as 10, OK, but really undef
>          return some_code_creating_T(%val);
>        }
>        Body Of The Coroutine
>        With Suspends and All The Fun
>    DeleteBB:
>        free(%mem)
>        s->%t = 20 //// BOOM writing to freed memory
>        coro.end(true)
>        UNREACHABLE
>      }
>
> Accessing anything in the coroutine frame after coro.fork "true" is undefined.
> As by the time you get to returnBB, the coroutine frame could already
> be destroyed.

So when you say "some_code_creating_T()" are there restrictions on
what kind of code there could be as part of the creation logic (I've
so far assumed that it can be arbitrary code).  If it can't
(semantically) touch anything in the stack frame then why have it as
part of @coro() ?

-- Sanjoy