[LLVMdev] GSOC Adaptive Compilation Framework for LLVM JIT Compiler

Mon Apr 4 12:47:56 PDT 2011

On Apr 4, 2011, at 10:49 AM, Owen Anderson wrote:

> 
> On Apr 3, 2011, at 12:11 PM, Eric Christopher wrote:
> 
>> <snip conversation about call patching>
> 
> It seems to me that there's a general feature here that LLVM is lacking, that would be useful in a number of JIT-compilation contexts, namely the ability to mark certain instructions (direct calls, perhaps branches too) as back-patchable.  
> 
> The thing that stands out to me is that back-patching a call or branch in a JIT'd program is very much like relocation resolution that a dynamic linker does at launch time for a statically compiled program.  It seems like it might be possible to build on top of the runtime-dyld work that Jim has been doing for the MC-JIT to facilitate this.   Here's the idea:
> 
> Suppose we had a means of tagging certain calls (and maybe branches) as explicitly requiring relocations.  Any back-patchable call would have a relocation in the generated code, and the MC-JIT would be aware of the location and type of the relocations, and rt-dyld would handle the upfront resolution.  Backpatching, then, is just a matter of updating the resolution for a given symbol, and asking rt-dyld to re-link the executable code.
> 

You nailed it. This is basically the direction MCJIT is heading down for address resolution (code coming soon to a tree near you). You're right that the other half of that problem is making sure that all relevant instructions will need to have explicit relocations. That's part of the motivation for using MachO as a container, at least at first, for the MCJIT, as it effectively has that constraint by virtue of its treatment of Atoms.

This will allow things like compiling a function into one address space, then copying it into another for execution (think a debugger or compile server process needing to inject JITed code into a client process address space). Likewise, moving things around (e.g., recompiling a hot function with more optimization) will result in needing to re-patch the addresses for that symbol.

One catch is that addresses can escape the relocations, so back-patching can't handle everything directive. Specifically, function pointers make things "fun" since two pointer to the same function must always compare equal. We can deal with that by detecting when an address escapes like that, which shouldn't be too hard (hand wave), and allocating a stub containing just a direct branch instruction. Indirect references to the symbol always get the stub, and when the function is re-compiled, the stub is back-patched to branch to the new destination. So things work, at the cost of an additional unconditional branch on the execution path. In the context of a JIT, that should be a relatively small cost to pay.

Also of interest is freeing up the allocated memory when it's no longer needed. For functions, this isn't too hard, as we just backpatch references back to a compilation callback (or interpreter or whatever else we want). For global values, including the above function pointer indirection stubs, things are a bit trickier. The most likely answer will be to have those objects be reference counted such that they're automatically deallocated when no more objects remain that reference them. An explicit call to free them becomes a nop. To be completely thorough, we'll need cycle detection, but in practice, I suspect that happens rarely enough that it won't be a big issue if we don't get to it right off the bat.

-Jim