<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div>Hi Sanjoy,</div><div><br></div><div><blockquote type="cite"><blockquote type="cite"><font color="#000000"><span style="background-color: rgba(255, 255, 255, 0);">(1) Replacing function bodies at the same address is impossible if the<br></span></font></blockquote><blockquote type="cite"><font color="#000000"><span style="background-color: rgba(255, 255, 255, 0);">function is already on the stack: You'd be replacing a definition that<br></span></font></blockquote><blockquote type="cite"><font color="#000000"><span style="background-color: rgba(255, 255, 255, 0);">you're later going to return through.<br></span></font></blockquote><font color="#000000"><span style="background-color: rgba(255, 255, 255, 0);"><br>If the function you wish to replace is active on the stack, you can<br>replace the return PC that was going to return into that active frame<br>with a PC pointing into a stub that knows how to replace the active<br>stack frame with something that would let the new code continue<br>executing. The stub will then have to branch into a suitable position<br>in the new generated code. Once you have done this for all "pending<br>returns" into the old bit of generated code, you can throw the old<br>code away, since nothing will ever return into it.<br><br>This can be tricky to get right but if you have built OSR support<br>already for some other reason then this is a viable option. This<br>scheme is very similar to throwing an exception, and the semantics of<br>"catching" an exception is to branch to a newly generated block of<br>code.<br></span></font></blockquote><div><span style="background-color: rgba(255, 255, 255, 0);"><br></span></div><span style="background-color: rgba(255, 255, 255, 0);">That all makes sense. What are your thoughts on the trade-offs of this vs the patchpoint approach though? If you can modify previously executable memory it seems like the patchpoint approach would have lower overhead, unless you have a truly huge number of callsites to update?</span><div><span style="background-color: rgba(255, 255, 255, 0);"> <br></span><blockquote type="cite"><blockquote type="cite"><font color="#000000"><span style="background-color: rgba(255, 255, 255, 0);">So, if you want to replace functions<br></span></font></blockquote><blockquote type="cite"><font color="#000000"><span style="background-color: rgba(255, 255, 255, 0);">at the same address you'll have to have some sort of safe-point concept<br></span></font></blockquote><blockquote type="cite"><font color="#000000"><span style="background-color: rgba(255, 255, 255, 0);">where you know the function you want to replace isn't on the stack.<br></span></font></blockquote><font color="#000000"><span style="background-color: rgba(255, 255, 255, 0);"><br>That will work, but can be very hard to make happen. For instance,<br>the method you want to replace may have called a function that has an<br>infinite loop in it.<br></span></font></blockquote><div><span style="background-color: rgba(255, 255, 255, 0);"><br></span></div><div><span style="background-color: rgba(255, 255, 255, 0);">Agreed. This *might* find a home in simple REPLs where calling an infinite-loop would be undesirable/unexpected behavior, but that's also an environment where you are unlikely to want reoptimization.</span></div><div><span style="background-color: rgba(255, 255, 255, 0);"> </span></div><blockquote type="cite"><blockquote type="cite"><font color="#000000"><span style="background-color: rgba(255, 255, 255, 0);">(2) Replacing function bodies at the same address isn't the only way to<br></span></font></blockquote><blockquote type="cite"><font color="#000000"><span style="background-color: rgba(255, 255, 255, 0);">avoid the overhead of a trampoline. I haven't implemented this yet, but I<br></span></font></blockquote><blockquote type="cite"><font color="#000000"><span style="background-color: rgba(255, 255, 255, 0);">really want to add llvm.patchpoint support to Orc. In that case you can lay<br></span></font></blockquote><blockquote type="cite"><font color="#000000"><span style="background-color: rgba(255, 255, 255, 0);">down your replacement definition at a different address, update all your<br></span></font></blockquote><blockquote type="cite"><font color="#000000"><span style="background-color: rgba(255, 255, 255, 0);">callsites, then delete your old definition after you're done executing it.<br></span></font></blockquote><blockquote type="cite"><font color="#000000"><span style="background-color: rgba(255, 255, 255, 0);">Relative to using trampolines this lowers your execution cost (calls are<br></span></font></blockquote><blockquote type="cite"><font color="#000000"><span style="background-color: rgba(255, 255, 255, 0);">direct rather than indirect), but increases your update cost (you have to<br></span></font></blockquote><blockquote type="cite"><font color="#000000"><span style="background-color: rgba(255, 255, 255, 0);">update many callsites, rather than a single trampoline).</span></font></blockquote></blockquote><div><span style="background-color: rgba(255, 255, 255, 0);"><br></span></div><div><span style="background-color: rgba(255, 255, 255, 0);">FWIW, Pete Cooper and I have tossed around ideas about adding utilities to Orc for injecting frame-residence counting and automatic cleanup in to functions to facilitate this 2nd approach. The rough idea was that each function would increment a counter on entry and decrement it on exit. Every time the counter hits zero it would check whether it has been "deleted" (presumably due to being replaced), and if so it would free its memory. This scheme should be easy to implement, but hasn't gone past speculation on our part.</span></div><div><span style="background-color: rgba(255, 255, 255, 0);"><br></span></div><div><span style="background-color: rgba(255, 255, 255, 0);">- Lang.</span></div></div><br>Sent from my iPad</div><div><br>On Mar 19, 2015, at 3:00 PM, Sanjoy Das <<a href="mailto:sanjoy@playingwithpointers.com">sanjoy@playingwithpointers.com</a>> wrote:<br><br></div><blockquote type="cite"><div><span>On Wed, Mar 18, 2015 at 6:39 PM, Lang Hames <<a href="mailto:lhames@gmail.com">lhames@gmail.com</a>> wrote:</span><br><blockquote type="cite"><span>Hi Hayden,</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Dave's answer covers this pretty well. Neither Orc nor MCJIT currently</span><br></blockquote><blockquote type="cite"><span>reason about replacing function bodies. They may let you add duplicate</span><br></blockquote><blockquote type="cite"><span>definitions, but how they'll behave if you do that isn't specified in their</span><br></blockquote><blockquote type="cite"><span>contracts. They definitely won't replace old definitions unless you provide</span><br></blockquote><blockquote type="cite"><span>a custom memory manager that's rigged to lay new definitions down on top of</span><br></blockquote><blockquote type="cite"><span>old ones.</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>I suspect that existing clients of MCJIT have tackled this by adding thread</span><br></blockquote><blockquote type="cite"><span>safety into their wrappers around MCJIT, or into the JIT'd code itself, but</span><br></blockquote><blockquote type="cite"><span>I'm just guessing. (CC'ing Keno and Philip, in case they have insights).</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>I think this would be cool to build in to Orc though. Two quick thoughts:</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>(1) Replacing function bodies at the same address is impossible if the</span><br></blockquote><blockquote type="cite"><span>function is already on the stack: You'd be replacing a definition that</span><br></blockquote><blockquote type="cite"><span>you're later going to return through.</span><br></blockquote><span></span><br><span>If the function you wish to replace is active on the stack, you can</span><br><span>replace the return PC that was going to return into that active frame</span><br><span>with a PC pointing into a stub that knows how to replace the active</span><br><span>stack frame with something that would let the new code continue</span><br><span>executing. The stub will then have to branch into a suitable position</span><br><span>in the new generated code. Once you have done this for all "pending</span><br><span>returns" into the old bit of generated code, you can throw the old</span><br><span>code away, since nothing will ever return into it.</span><br><span></span><br><span>This can be tricky to get right but if you have built OSR support</span><br><span>already for some other reason then this is a viable option. This</span><br><span>scheme is very similar to throwing an exception, and the semantics of</span><br><span>"catching" an exception is to branch to a newly generated block of</span><br><span>code.</span><br><span></span><br><blockquote type="cite"><span>So, if you want to replace functions</span><br></blockquote><blockquote type="cite"><span>at the same address you'll have to have some sort of safe-point concept</span><br></blockquote><blockquote type="cite"><span>where you know the function you want to replace isn't on the stack.</span><br></blockquote><span></span><br><span>That will work, but can be very hard to make happen. For instance,</span><br><span>the method you want to replace may have called a function that has an</span><br><span>infinite loop in it.</span><br><span></span><br><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>(2) Replacing function bodies at the same address isn't the only way to</span><br></blockquote><blockquote type="cite"><span>avoid the overhead of a trampoline. I haven't implemented this yet, but I</span><br></blockquote><blockquote type="cite"><span>really want to add llvm.patchpoint support to Orc. In that case you can lay</span><br></blockquote><blockquote type="cite"><span>down your replacement definition at a different address, update all your</span><br></blockquote><blockquote type="cite"><span>callsites, then delete your old definition after you're done executing it.</span><br></blockquote><blockquote type="cite"><span>Relative to using trampolines this lowers your execution cost (calls are</span><br></blockquote><blockquote type="cite"><span>direct rather than indirect), but increases your update cost (you have to</span><br></blockquote><blockquote type="cite"><span>update many callsites, rather than a single trampoline).</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Out of interest, why the desire to avoid trampolines? They do make life a</span><br></blockquote><blockquote type="cite"><span>lot easier here. :)</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Cheers,</span><br></blockquote><blockquote type="cite"><span>Lang.</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>On Wed, Mar 18, 2015 at 3:13 AM, David Blaikie <<a href="mailto:dblaikie@gmail.com">dblaikie@gmail.com</a>> wrote:</span><br></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>[+Lang, keeper of JITs, designer of ORCs]</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>On Tue, Mar 17, 2015 at 1:27 AM, Hayden Livingston</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span><<a href="mailto:halivingston@gmail.com">halivingston@gmail.com</a>> wrote:</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>I've been playing with OrcJIT a bit, and from the looks of it I can (like</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>in the previous JIT I suppose?) ask for a function to be re generated.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>If I've given the address of the function that LLVM gave me to an</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>external party, do "I" need to ensure thread-safety?</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>Or is it safe to ask OrcJIT to re generate code at that address and</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>everything will work magically?</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>As I understand it, Orc won't regenerate the function at the same location</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>unless your memory manager returns the same memory twice - so if you know</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>you've successfully migrated all callers off a certain chunk of allocated</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>memory, you might be able to recycle it back into Orc (but I think on MacOS,</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>the way page permissions work, this would be impossible - once a memory page</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>is marked executable, it's no longer writable and can't be set back - you</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>need a new page).</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>I'm thinking it won't because it's quite possible some thread might be</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>executing code, and we'll be asking LLVM to write bytes there.</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>How does one generally go do such updates? I'm looking for some guidance</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>without adding a trampoline in front of it. Do runtimes that support</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>re-generation of code have an if check or something before entering the</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>method?</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>Without a trampoline you're probably going to have to be constrained in</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>some other ways - possibly (& I'm really out of my depth at this point) the</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>kind of safe/pause points used for GC - but perhaps more constrained than</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>that, such that you have safe places where your JIT'd code (or at least the</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>replaceable functions) isn't running.</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>But again, still depends on platform - writing to executable memory isn't</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>possible on MacOS so far as I know (as mentioned above) so there would be no</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>way to replace a function there without a trampoline or at least a global</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>variable to load/jump to.</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>- David</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>_______________________________________________</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>LLVM Developers mailing list</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span><a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu">http://llvm.cs.uiuc.edu</a></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>_______________________________________________</span><br></blockquote><blockquote type="cite"><span>LLVM Developers mailing list</span><br></blockquote><blockquote type="cite"><span><a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu">http://llvm.cs.uiuc.edu</a></span><br></blockquote><blockquote type="cite"><span><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a></span><br></blockquote><blockquote type="cite"><span></span><br></blockquote></div></blockquote></body></html>