[llvm-dev] [RFC] LLVM Coroutines

Fri Jun 10 22:32:39 PDT 2016

Hi Eli:

>> Err... I guess nothing; sorry, I got myself confused.

That is my fault :-). I confused myself in the beginning where I was trying to
treat coro.suspend + cond.branch pair as a terminator and was looking for
branches. You showed me the way... Just ignore the branches and represent
the control flow cleanly and SimplifyCFG will take care of the rest :-).

>> You might run into problems with allocas:

>>   x = alloca...
>>
>> block1:
>>   suspend
>>   ; increment could get moved here.
>>   switch (resume, destroy, return)
>>
>> resume:
>>   x += 1 ; load + inc + store
>>   [...]
>>
>> destroy:
>>   x += 1 ; load + inc + store
>>   [...]
>>
>> return:
>>   (don't touch x)
>>   [...]

Yep. You are right. Need some kind of barrier to prevent the motion.

Okay. New intrinsics:

declare void coro.barrier(i64); // will get a unique number for every insertion

>>   x = alloca...
>>
>> block1:
>>   suspend
>>   ; nothing can sneak in here???, Right?
>>   switch (resume, destroy, return)
>>
>> resume:
>>   coro.barrier(0)
>>   x += 1 ; load + inc + store
>>   [...]
>>
>> destroy:
>>   coro.barrier(1)
>>   x += 1 ; load + inc + store
>>   [...]
>>
>> return:
>>   coro.barrier(2)
>>   (don't touch x)
>>   [...]

Now we are good, right?
I am still not giving up hope for an intrinsic approach.

Thanks a lot, Eli, for your help with the design!
Gor

P.S

There is always a backup plan to go to your "two function" idea with a slight
twist.

Almost two functions (coro.fork(), coro.suspendO() same as in original proposal)
================================================================================

CoroEarly pass which runs at EP_EarlyAsPossible will extract f.start using
coro.fork as a guide (since false branch bypasses the coroutine body).

Coroutine will now travel as two functions until CoroSplit gets to f.start
(Probably will use 'coroutine' attribute on a function to indicate that it
requires CoroSplit to munge on it.)

I am still hoping we can find a solution without requiring to separate f.start
out. Seems like a huge hammer to avoid undesirable code motion / hoisting.

(I really hope that @coro.barrier(i64) approach can work)

On Fri, Jun 10, 2016 at 7:40 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
> On Fri, Jun 10, 2016 at 5:25 PM, Gor Nishanov <gornishanov at gmail.com> wrote:
>>
>> Hi Eli:
>>
>> >> Naively, you would expect that it would be legal to hoist the store...
>> >> but that breaks your coroutine semantics because the global could be
>> >> mutated
>> >> between the first return and the resume.
>>
>> Hmmm... I don't see the problem. I think hoisting the store is perfectly
>> legal
>> transformation with all semantics preserved.
>>
>> Let's look at your example:
>>
>> >> block1:
>> >>   suspend
>> >>   switch (resume, destroy, return)
>> >>
>> >> resume:
>> >>   store zero  to global @g
>> >>   doA()
>> >>   [...]
>> >>
>> >> destroy:
>> >>   store zero  to global @g
>> >>   doB()
>> >>   [...]
>> >>
>> >> return:
>> >>   store zero to global @g
>> >>   doC
>> >>   [...]
>>
>> As written the behavior is:
>>
>>   1) when we encounter a suspend during the first pass through the
>> function,
>>      store 0 to @g and doC()
>>
>>   2) when we encounter a suspend after coroutine was resumed
>>      ret void
>>
>>   3) When coroutine is resumed:
>>      store 0 to @g and doA()
>>
>>   4) When coroutine is destroyed:
>>      store 0 to @g and doB()
>>
>> If this is a coroutine that can be resumed asynchronously from a different
>> thread there is a data race. For example, if resume happens before 'f'
>> returns,
>> doA() can write something useful to @g, and 'f' will clobber it with zero.
>> But, this race is already present in the code and is not introduced by
>> LLVM.
>>
>> Let's hoist the store and see what happens:
>>
>> >> block1:
>> >>   suspend
>> >>   store zero  to global @g
>> >>   switch (resume, destroy, return)
>> >>
>> >> resume:
>> >>   doA()
>> >>   [...]
>> >>
>> >> destroy:
>> >>   doB()
>> >>   [...]
>> >>
>> >> return:
>> >>   doC()
>> >>   [...]
>>
>> Now, let's split it up:
>>   1. RAUW coro.suspend with -1,0,1 in f, f.resume and f.destroy
>>   2. remove coro.suspend in f, replace with ret void in f.resume
>>
>> void f() {
>>   [...]
>>   store zero  to global @g
>>   doC();
>>   [...]
>> }
>>
>> void @f.resume() {
>> entry:
>>   store zero  to global @g
>>   doA();
>>   [....]
>> }
>>
>> void @f.destroy() {
>> entry:
>>   store zero  to global @g
>>   doB();
>>   [....]
>> }
>>
>> Behavior looks exactly as before. What am I missing?
>>
>
> Err... I guess nothing; sorry, I got myself confused.  Essentially executing
> a return statement, then jumping back, seems wrong, but I'm having trouble
> coming up with a good example of how it would actually break something.  I
> guess there aren't really any issues on a pure control-flow basis: you can't
> execute a global side-effect a different number of times than it would
> otherwise happen.
>
> You might run into problems with allocas: LLVM's optimizer will assume the
> lifetime of any alloca in the current function ends when you hit a "ret"
> instruction, so jumping back from the ret to the middle of the function
> could lead to wrong results.  Silly example:
>
> x = alloca...
>
> block1:
>   suspend
>   ; increment could get moved here.
>   switch (resume, destroy, return)
>
> resume:
>   x += 1
>    [...]
>
> destroy:
>   x += 1
>    [...]
>
> return:
>   (don't touch x)
>   [...]
>
> The increment is only supposed to execute once, but instead executes twice.
>
> This isn't a great example... and I'm not sure issues are limited to
> allocas... but that's the idea, anyway.
>
> -Eli
>