[LLVMdev] Proposal: stack/context switching within a thread

Sun Apr 11 14:09:35 PDT 2010

Kenneth Uildriks <kennethuil at gmail.com> wrote:
> As I see it, the context switching mechanism itself needs to know
> where to point the stack register when switching.  The C routines take
> an initial stack pointer when creating the context, and keep track of
> it from there.  If we don't actually need to interoperate with
> contexts created from the C routines, we have a lot more freedom.

I guess the reason to interoperate with contexts from the C routines
would be to support ucontext_t's passed into signal handlers? But then
the LLVM intrinsics need to specify that their context's layout is the
same as ucontext_t's, on platforms where ucontext_t exists.

> Anyway, one approach would be to expose intrinsics to interrogate an
> inactive context, to get its initial stack pointer (the one it was
> created with) and its current stack pointer, and also  to modify both
> before making the context active again.
>
> I don't see any reason why this scheme wouldn't also be compatible
> with segmented stacks.
> ...
> On the other hand, stack manipulation really ought to be handled by
> the target, since only the target knows the details of how the stack
> is laid out to begin with.  Also, if we have stack manipulation calls
> in the IR, optimization quickly becomes very difficult.  Unless we
> just allow optimizers to ignore the stack manipulations and assume
> they're doing the "right" thing.
>
> On the gripping hand, we don't want the target emitting memory
> allocation calls in order to grow the stack (unless a function pointer
> to malloc or its equivalent is passed in from the IR).

In gcc's split-stacks
(http://gcc.gnu.org/ml/gcc/2009-02/msg00429.html; I got the name wrong
earlier), Ian planned to call a known global name to allocate memory
(http://gcc.gnu.org/ml/gcc/2009-02/msg00479.html). I'm not sure what
he actually wound up doing on the gccgo branch. LLVM could also put
the allocation/deallocation functions into the context, although it'd
probably be better to just follow gcc.

>> The way they accomplish that now is by
>> copying the entire stack to the heap on a context switch, and having
>> all threads share the main C stack. This isn't quite as bad as it
>> sounds because it only happens to threads that call into C extension
>> modules. Pure Python threads operate entirely within heap Python
>> frames. Still, it would be nice to support this use case.
>
> This wouldn't hold in IR, since virtual registers regularly get
> spilled to the stack.. every context, regardless of the language,
> would have to have its stack saved.  Also, this method would mean that
> a context cannot be used in any native thread other than the one that
> created it, right?

Well, a frontend can generate code in continuation-passing style or do
all of its user-level "stack" frame manipulation on the heap. Then it
only uses a constant amount of C-stack space, which might not be part
of the context that needs to be switched. Only foreign calls
necessarily use a chunk of C stack. Stackless's approach does seem to
prevent one coroutine's foreign code from using pointers into another
coroutine's stack, and maybe they could/should create a new context
each time they need to enter a foreign frame instead of trying to copy
the stack...

> 2. We should be able to support "hard switching" in Stackless Python
> by adding a llvm.getcontextstacktop intrinsic.  If, as in Kristján's
> example, llvm.getcontext is used to create context A, and then
> execution continues until context B is created with
> llvm.swapcontext(B, A), the region of memory between
> llvm.getcontextstacktop(A) and llvm.getcontextstacktop(B) can be saved
> and later restored when B is resumed.

Wait, what stack top does swapcontext get? I'd thought that A's and
B's stack top would be the same since they're executing on the same
stack.

> Of course that usage would
> throw a monkey wrench into a segmented stack scheme... it assumes that
> context stack areas actually behave like contiguous stacks.  Not only
> that, it assumes that no pointers to a context's stack exist outside
> of the context... when the context is inactive, a pointer into a
> context's stack won't be valid!
>
> But in the case of Stackless Python, these caveats can be addressed
> with a simple "Don't do that!", since it's all tied into the language.

And users shouldn't need both stack copying and split stacks. Just one
should suffice.

> 3. I would need to run some benchmarks, but in some cases it might be
> better to use mmap to swap stacks between contexts... that way nothing
> would need to be copied.

Presumably the user would deal with that in allocating their stacks
and switching contexts, using the intrinsics LLVM provides? I don't
see a reason yet for LLVM to get into the mmap business.

> 4. I'm hoping that LLVM ends up growing optimization passes that
> minimize the actual physical use of contexts in many use cases.

That sounds very tricky...

> Also,
> we might be able to guarantee small stack usage with a pass that
> forces recursive calls to spawn a new context and turns large alloca's
> into malloc's, making it safer to have a bunch of little stacks
> without any needed juggling.

This sounds like a stopgap until real split stacks can be implemented.
http://gcc.gnu.org/wiki/SplitStacks#Backward_compatibility describes
some of the other difficulties in getting even this much to work.
(foreign calls, and function pointers, at least)