[llvm-dev] [RFC] LLVM Coroutines

Gor Nishanov via llvm-dev llvm-dev at lists.llvm.org
Sun Jun 12 20:29:45 PDT 2016


Small clarification. In coro.* emulation via fibers, the comment on
coro.end(true)

       coro.end(true) // switches back to the thread at ReturnBB <-- this one
       UNREACHABLE // <== never reaches here. So we are honest with optimizer
     }

should read something like:

    coro.end(true) // switches back to the thread that resumed this fiber.

Which will be ReturnBB if we did not hit any suspends on the way to coro.end.
If we did hit a suspend, it is the suspend that will switch back to ReturnBB.

With llvm implementation, it is modeled by replacing coro.suspend and
coro.end(true) with
br ReturnBB in the initial function which represents the first
execution of the coroutine before it reached any suspends.

In resume function, coro.suspend and coro.end are replaced with `ret
void`, thus modelling returning to whomever resumed us.

To me it looks that it does match fiber behavior exactly.

-- Gor

On Sun, Jun 12, 2016 at 6:40 PM, Gor Nishanov <gornishanov at gmail.com> wrote:
> Hi Sanjoy:
>
>>> Now in the above CFG %val can be folded to 10, but in reality you need
>>> a PHI of 10 and 20
>
> Quick answer: folding %val to 10 is OK, but, I would prefer it to be 'undef'
> or even catch it in the verifier as it is a programmer/frontend error.
>
> Details are in the second half of my reply. I would like to start
> with the thought experiment you suggested, as it might help to explain
> the behavior a little better.
>
>>> The thought experiment relevant here is that **could** you implement
>>> your semantics with reasonable bodies for the llvm.coro intrinsics?
>
> I can emulate **control flow** very closely with user mode thread switching
> APIs like make_context, jump_context. Memory behavior not so much.
>
> Library emulation
> =================
>
> Let's build a fiber abstraction out of make_context, jump_context.
>
> i1 fiber_fork(i8* %mem) - clones this function into a fiber
>                           returns 0 - in a fiber
>                           returns 1 - in a thread
>                           uses %mem to store fiber context + fiber stack
>
> i1 fiber_suspend()      - suspends current fiber, switches back to thread
>                           stores the fiber context to be able to resume later
>
> void fiber_join(i8* %mem i1 %val) - switches to fiber stack identified by %mem,
>                           fiber_suspend will return %val when resumed
>
> void fiber_end() noreturn
>                         - switches back to a thread, does not remember fiber
>                           context and does not touch %mem.
>                           (Thus you can use this after %mem is freed)
>
>
> Detail: fiber_fork(i8* %mem)
> ----------------------------
>   * Prepares fcontext in %mem utilizing the rest of the memory for stack.
>   * Saves current thread context on a current thread stack.
>     (We should not store if in %mem, otherwise we won't be
>      able to switch back after fiber memory is freed)
>   * Sets up **current thread** context, so that when resumed it will proceed
>     as if fiber_fork returned true
>   * Sets up **fiber** context so that when resumed, the fiber will proceed
>     as if fiber_fork returned false.
>   * Clones the locals on the thread stack to a fiber stack
>     (This is very approximate behavior, and won't work if we took address of
>      alloca and stashed it somewhere, but let's ignore it for now as we
>      are focusing on control flow)
>   * Switches to fiber
>
> Okay, let's map it to our intrinsics:
>
> coro.fork == fiber_fork
> coro.suspend == fiber_suspend
> coro.end(true) == fiber_end
> coro.resume = fiber_join(%mem, i1 false)
> coro.destroy = fiber_join(%mem, i1 true)
> coro.size == 1,000,000 (or some other big number)
>
> Now let's walk through this example in the fiber model:
>
>    T coro() {
>        %t = alloca
>        %mem = malloc(...);
>        (*%t) = 10
>        if (coro.fork()) { // switches to fiber at label Start
>    ReturnBB:
>          %val = *%t // will be 10, as fiber has its own copy
>          return some_code_creating_T(%val);
>        }
>    Start:
>        Body Of The Coroutine
>        With Suspends and All The Fun
>    DeleteBB:
>        (*%t) = 15 ; // fine, copy of %t on a fiber stack updated
>        free(%mem)
>        (*%t) = 20 ; // undef: as the fiber memory is freed already
>        coro.end(true) // switches back to the thread at ReturnBB
>        UNREACHABLE // <== never reaches here. So we are honest with optimizer
>      }
>
>
> Back to the LLVM based implementation
> =====================================
>
> In your example, there will be a path from the block where %t is used to
> where %t is defined that crosses a suspend point. Therefore %t will be part of
> the coroutine frame. So before the split, the program will be transformed into:
>
>    coro.frame = type { int %t }
>
>    T coro() {
>        %t = alloca
>        %mem = malloc(...);
>        %s = (coro.frame*)coro.init(%mem)
>        s->%t = 10
>        if (coro.fork()) {
>    ReturnBB:
>          %val = 10 // const.prop as 10, OK, but really undef
>          return some_code_creating_T(%val);
>        }
>        Body Of The Coroutine
>        With Suspends and All The Fun
>    DeleteBB:
>        free(%mem)
>        s->%t = 20 //// BOOM writing to freed memory
>        coro.end(true)
>        UNREACHABLE
>      }
>
> Accessing anything in the coroutine frame after coro.fork "true" is undefined.
> As by the time you get to returnBB, the coroutine frame could already
> be destroyed.
>
> Both problems can be statically detected in the Verifier.
>
> Alternatively, in CoroEarly pass that runs at EP_EarlyAsPossible extension
> point, we can replace:
>
>          %val = *%t
>              with
>          %val = undef (used after coro.fork() returned true)
>
>               and
>
>        free(%mem)
>        (*%t) = 20
>             with
>        @llvm.trap() ; used after frame is destroyed
>
>
> --Gor


More information about the llvm-dev mailing list