On Mon, Apr 11, 2011 at 12:38 PM, Reid Kleckner <span dir="ltr"><<a href="mailto:reid.kleckner@gmail.com">reid.kleckner@gmail.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">


This reminds me of Kenneth's "context" proposal from some time back:<br>

<br>

<a href="http://lists.cs.uiuc.edu/pipermail/llvmdev/2010-April/030787.html" target="_blank">http://lists.cs.uiuc.edu/pipermail/llvmdev/2010-April/030787.html</a><br>

<br>

I haven't compared the two too closely, but I thought I should throw<br>

it out there as food for thought.<br></blockquote><div><br></div><div>That's very interesting, I had not seen that before. It would pretty much solve all of my needs, although it only supports fixed-sized stacks. If this were combined with what Sanjoy is proposing that would be terrific.</div>


<div><br></div><div>There's one point there that I don't quite understand - the doc says that the context buffer that you allocate would be large enough to store a snapshot of all the machine's registers. I would think that the machine registers would be more conveniently stored on the stack itself, and the context buffer would merely contain a pointer to the end of the stack - that means that a call to setcontext() would simply push all the current registers, change the stack pointer, and then pop each register value off of the stack before returning. However, that's a minor implementation detail and not something I really care about one way or the other.</div>


<div><br></div><div>There's one other feature that I can think of, not strictly necessary, but might help make context switching more efficient. Let's assume that I have a yield() function in my language which wraps around llvm.setcontext() and performs some additional language-specific housekeeping. So for example:</div>


<div><br></div><div><font class="Apple-style-span" face="'courier new', monospace">    void yield() {</font></div><div><font class="Apple-style-span" face="'courier new', monospace">       llvm.swap_context(next_context, etc);</font></div>


<div><font class="Apple-style-span" face="'courier new', monospace">       // If the other context returned with an exception,</font></div><div><font class="Apple-style-span" face="'courier new', monospace">       // propagate the exception to the current context.</font></div>


<div><font class="Apple-style-span" face="'courier new', monospace">       if (pending_throwable) {</font></div><div><font class="Apple-style-span" face="'courier new', monospace">         throw pending_throwable;</font></div>


<div><font class="Apple-style-span" face="'courier new', monospace">       }</font></div><div><font class="Apple-style-span" face="'courier new', monospace">    }</font></div><div><br></div><div>So the problem here is that I have to perform a check for a throwable, when 99.9% of the time it will be NULL. One could get around this by playing games with the return address:</div>


<div><br></div><div><meta charset="utf-8"><div><font class="Apple-style-span" face="'courier new', monospace">    void yield() {</font></div><div><font class="Apple-style-span" face="'courier new', monospace">       llvm.swap_context(next_context, etc);</font></div>


<div><font class="Apple-style-span" face="'courier new', monospace">    normal_exit:</font></div><div><font class="Apple-style-span" face="'courier new', monospace">       return;</font></div><div><font class="Apple-style-span" face="'courier new', monospace">    error_exit:</font></div>


<div><span class="Apple-style-span" style="font-family: 'courier new', monospace; ">       throw pending_throwable;</span></div><div><span class="Apple-style-span" style="font-family: 'courier new', monospace; ">    }</span></div>


</div><div><font class="Apple-style-span" face="'courier new', monospace"><br></font></div><div>The idea is that when you want to pass an exception to the other thread, you would modify the return address on the stack to point to error_exit instead of normal_exit. Unfortunately to get this to work you'd also have to convince LLVM's optimizer not to complain about unreachable code, so maybe this is more trouble than it's worth.</div>


<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><font class="Apple-style-span" color="#888888">

Reid</font><br>

<div><div></div><div class="h5"><br>

On Mon, Apr 11, 2011 at 3:26 PM, Talin <<a href="mailto:viridia@gmail.com">viridia@gmail.com</a>> wrote:<br>

> On Mon, Apr 11, 2011 at 7:34 AM, Justin Holewinski<br>

> <<a href="mailto:justin.holewinski@gmail.com">justin.holewinski@gmail.com</a>> wrote:<br>

>><br>

>> On Mon, Apr 11, 2011 at 9:07 AM, Sanjoy Das<br>

>> <<a href="mailto:sanjoy@playingwithpointers.com">sanjoy@playingwithpointers.com</a>> wrote:<br>

>>><br>

>>> Hi!<br>

>>><br>

>>> Thanks for the feedback. For context, my implementation plan is here:<br>

>>> <a href="http://pastebin.com/e9JMZNCE" target="_blank">http://pastebin.com/e9JMZNCE</a><br>

>>><br>

>>> First, about unwinding:<br>

>>><br>

>>> In architectures like x86-64, where unwinding based on DWARF info, there<br>

>>> shouldn't be any problems; since the DWARF info will be emitted<br>

>>> correctly. Otherwise, if the unwinding is done by following BP, it<br>

>>> should still be possible to have BP de-reference correctly (ref. "Frame<br>

>>> Pointers" section in the implementation plan). SP will not always have a<br>

>>> correct value - I don't know if this is problem.<br>

>>><br>

>>> About co-routines:<br>

>>><br>

>>> Here is a sketch of how I think co-routines can be implemented (I'll<br>

>>> merge this with the main implementation plan after I get some feedback):<br>

>>><br>

>>> Have a new instruction, called "yield" to return a value from a<br>

>>> co-routine, preserving the state. Thus, we immediately know which<br>

>>> functions are co-routines. Each co-routine will have a new stack.<br>

>>> Associate each co-routine with a thread-local global variable (called<br>

>>> saved_stack here, will have to be mangled with the name of the<br>

>>> co-routine) which points to the start stack block for that co-routine.<br>

>>> This will be the first block in the chain of blocks to follow.<br>

>>><br>

>>> The structure of the block will be similar to the structure of a regular<br>

>>> stack block, except that it will also have space to store two registers<br>

>>> - this_ip and this_sp.<br>

>>><br>

>>> The prologue of a co-routine will jump to a function similar to<br>

>>> setup_new_block (setup_new_block_coroutine) which will work like<br>

>>> setup_new_block, except:<br>

>>><br>

>>> 1. It will first check if saved_stack is NULL. If it is NULL, it will<br>

>>> allocate a new block and save it to saved_stack. It if isn't, it'll<br>

>>> simply restore saved_sp, saved_ip.<br>

>>><br>

>>> 2. In case a new block was allocated, it will pretty much do what<br>

>>> setup_block does, after which it will adjust the SP to make space for<br>

>>> the saved registers.<br>

>>><br>

>>> The destroy_block procedure will also have to be a little different<br>

>>> (mentioned below).<br>

>>><br>

>>> There are four things (relevant to this discussion) a co-routine can do:<br>

>>><br>

>>> Yield<br>

>>><br>

>>> This returns control to the calling function, without forgetting the<br>

>>> current state of the function. To do this, we save saved_ip and<br>

>>> saved_sp. Every yield be padded with instructions pushing and popping<br>

>>> the registers live at that point. Then we set the return value (register<br>

>>> or memory), and restore saved_sp and saved_ip from the current block. We<br>

>>> can't simply return because the actual return value has been hijacked to<br>

>>> provide for block cleanup.<br>

>>><br>

>>> Call to regular function<br>

>>><br>

>>> Just a simple call - the caller's prologue will handle setting up a it's<br>

>>> own stack space etc.<br>

>>><br>

>>> Call to Co-routine<br>

>>><br>

>>> This too should "just work", since all the heavy-lifting is done in the<br>

>>> co-routine's prologue. However, the above approach will not work for<br>

>>> nested co-routines (i.e. calling the same co-routine body with one call<br>

>>> is still active, recursively). I'm not sure if having support for nested<br>

>>> co-routines will add any value.<br>

>>><br>

>>> Return<br>

>>><br>

>>> This will be a regular return. Since the return value has been hijacked<br>

>>> to point to a another block of code (destroy_block_coroutine), control<br>

>>> will jump there instead.<br>

>>><br>

>>> destroy_block_coroutine will free the linked-list of stack blocks (we<br>

>>> have to free this, since we will won't have a reference to this list<br>

>>> anymore), set saved_stack for this co-routine to NULL, and restore<br>

>>> saved_sp and saved_ip.<br>

>><br>

>> I'm wondering how much of this should be implemented as new LLVM<br>

>> functionality, and how much should be left to the front-end compiler.  With<br>

>> some additional LLVM intrinsics (e.g. llvm.stack.new.block,<br>

>> llvm.stack.delete.block, etc.), a front-end could take care of the details<br>

>> of how co-routines are actually implemented.  This would also give the<br>

>> front-end freedom to implement whatever semantics/conventions are<br>

>> necessary/required for the source language.  I'm just not sure having LLVM<br>

>> dictate how co-routines are implemented is the best way to approach this.<br>

><br>

> I'm in agreement with Justin.<br>

> 1) It's much easier to add new intrinsics to LLVM than it is to add new<br>

> instructions. Library functions are even easier - in general you want to use<br>

> intrinsics in cases where the presence of the intrinsic will affect the way<br>

> the code is generated in the calling function.<br>

> 2) The API that I was envisioning for creating stacks was something fairly<br>

> simple (maybe too simple):<br>

>    // Create a new stack, where 'stack_handle' is some opaque object, and<br>

> 'func'<br>

>    // is an LLVM function.<br>

>    stack_handle = llvm.createstack(func, data)<br>

>    // Switch to a different stack (i.e. yield)<br>

>    llvm.switchstack(stack_handle)<br>

>    // deallocate a stack<br>

>    llvm.destroystack(stack_handle)<br>

> The frontend would be responsible for managing the lifetime of the<br>

> stack_handle object. For something like Python generators, the stack_handle<br>

> would be stored in the iterator object that is returned from the generator,<br>

> and deallocated when the generator gets garbage-collected. For cooperative<br>

> threads, the handle would be stored as a private member of a Thread or<br>

> Coroutine class. Different languages would support different ways of using<br>

> coroutines.<br>

> Similarly, the frontend would be responsible for passing data between the<br>

> different coroutines, using either thread-local variables or some other<br>

> method.<br>

> Also I'm thinking the frontend would be responsible for propagating<br>

> exceptions between stacks - so for example, if my coroutine throws an<br>

> exception that is not caught, but instead propagates all the way to the top<br>

> of it's stack, then that coroutine gets terminated, and depending on the<br>

> language semantics, the same exception or another one gets thrown upon<br>

> return from 'yield' in the other context. I would have no objection to<br>

> implementing all of that logic in my frontend.<br>

> Because the internals of the stack_handle may be different on different<br>

> platforms, I'm thinking that the frontend should only get a void* pointer -<br>

> it doesn't need to know what's inside it. I think the frontend only needs a<br>

> small number of operations: create, yield, destroy, and iterate through call<br>

> frames (for garbage collection).<br>

><br>

> The 'yield' instruction seems somewhat inspired from Python's 'yield'<br>

> statement, which unfortunately has some substantial drawbacks - such as the<br>

> fact that the yield must be in the top-most call frame in order for the<br>

> compiler to know that the function is a generator / coroutine. In other<br>

> words, you can't in Python have a generator which calls a subroutine that<br>

> does the yield. I'd like to avoid that limitation. Scheme and Lua, to name<br>

> just two examples, have much more powerful models for doing continuations<br>

> and coroutines, and I'd like to be able to support those. I think that can<br>

> be done if we supply only the most minimal support on the LLVM side, and<br>

> leave most of the details to the frontend.<br>

>><br>

>><br>

>>><br>

>>> --<br>

>>> Sanjoy Das<br>

>>> <a href="http://playingwithpointers.com" target="_blank">http://playingwithpointers.com</a><br>

>>> _______________________________________________<br>

>>> LLVM Developers mailing list<br>

>>> <a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

>>> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

>><br>

>><br>

>><br>

>> --<br>

>><br>

>> Thanks,<br>

>> Justin Holewinski<br>

><br>

><br>

><br>

> --<br>

> -- Talin<br>

><br>

> _______________________________________________<br>

> LLVM Developers mailing list<br>

> <a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

><br>

><br>

</div></div></blockquote></div><br><br clear="all"><br>-- <br>-- Talin<br>