[LLVMdev] LLVM and coroutines/microthreads

Thu Apr 16 13:44:42 PDT 2009

On Thu, Apr 16, 2009 at 3:21 PM, OvermindDL1 <overminddl1 at gmail.com> wrote:
>
> First, I will assume that you have read
> http://www.nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt
> and if you have not, do so.

I hadn't.  That's very similar to what I had tried early on, but found
it was actually slower than managing my own stacks lazily (on
continuation, the functions check a state variable for whether or not
to freeze live variables.  The state var is kept pretty warm).  I
haven't gone nearly as deeply as you have, though.

It looks like the article is suggesting that you're keep a pointer and
do a bunch of derefs on it in all cases.  It just sounds inefficient,
unless I'm missing something.

> And yes, I have to agree, ever C/C++ coroutine library sucked beyond
> all heck, either stack fiddling, or they use crap like Microsoft's

Looks like pretty much all the options are going to require
bookkeeping, hackery, or both.  Even my stack copying suggestion has
restrictions on stack size.

> timeout, I have not implemented that yet either, but I intend to so
> when a timeout is specified then another message match is implicitly
> added that will match a timeout message, and send off a message to a
> timer scheduler that will send out a message with a timeout of a
> certain ID that it will send back after a time-delay.  If a message is

That's a handy idea.  This technique reminds me of Orc, if you haven't
seen it you should check it out: http://orc.csres.utexas.edu/

> I use no locking primitives for the things that are multi-threaded
> (such as the timer) with liberal use of instructions such as atomic
> compare-and-swap (my favorite, well, most used anyway, I also wrote

Funny you should mention this.  I've spent the better part of the last
few weeks trying to get a CAS solution to be rock-solid.  Then, it
turns out I could make a locky solution that performed similarly to
the CAS solution by using tiny (2-3 instructions) locked areas.  If
you look through Art of Multiprocessor Programming, you'll see that a
lot of the solutions use multiple CAS's per action.  As best I can
tell, if you CAS more than twice per task, you're starting to get into
the area of tiny locks performance-wise (at least on Intel platforms).

> system is received then a local proxy Actor is setup that will handle
> the network passing from the local to the remote system between those
> Actors as the remote proxy actor converts the network messages back

Sounds good.  I've thought about something like this, but shelved the
idea for the first release.

> So yes, LLVM can capture the live variable information and stack into
> a buffer, but it currently has no functionality to do so, thus you
> have to do it yourself as I do.  Although the above way is vastly
> faster then memcpy'ing entire stacks around (as we do in C++ itself to
> simulate full coroutines).

>From what you're telling me, as far as LLVM is concerned there are
ways to do it manually, you just have to pick one that suits the
project.  There aren't any automatic solutions, nor hooks to attach
your own.  I admit, I assumed that might be the case since there isn't
any "standard" coroutine style.

Jonathan