[LLVMdev] New JIT APIs

Wed Jan 14 00:05:06 PST 2015

Hi All,

The attached patch (against r225842) contains some new JIT APIs that I've
been working on. I'm going to start breaking it up, tidying it up, and
submitting patches to llvm-commits soon, but while I'm working on that I
thought I'd put the whole patch out for the curious to start playing around
with and/or commenting on.

The aim of these new APIs is to cleanly support a wider range of JIT use
cases in LLVM, and to recover some of the functionality lost when the
legacy JIT was removed. In particular, I wanted to see if I could re-enable
lazy compilation while following MCJIT's design philosophy of relying on
the MC layer and module-at-a-time compilation. The attached patch goes some
way to addressing these aims, though there's a lot still to do.

The 20,000 ft overview, for those who want to get straight to the code:

The new APIs are not built on top of the MCJIT class, as I didn't want a
single class trying to be all things to all people. Instead, the new APIs
consist of a set of software components for building JITs. The idea is that
you should be able to take these off the shelf and compose them reasonably
easily to get the behavior that you want. In the future I hope that people
who are working on LLVM-based JITs, if they find this approach useful, will
contribute back components that they've built locally and that they think
would be useful for a wider audience. As a demonstration of the
practicality of this approach the attached patch contains a class,
MCJITReplacement, that composes some of the components to re-create the
behavior of MCJIT. This works well enough to pass all MCJIT regression and
unit tests on Darwin, and all but four regression tests on Linux. The patch
also contains the desired "new" feature: Function-at-a-time lazy jitting in
roughly the style of the legacy JIT. The attached lazydemo.tgz file
contains a program which composes the new JIT components (including the
lazy-jitting component) to lazily execute bitcode. I've tested this program
on Darwin and it can run non-trivial benchmark programs, e.g. 401.bzip2
from SPEC2006.

These new APIs are named after the motivating feature: On Request
Compilation, or ORC. I believe the logo potential is outstanding. I'm
picturing an Orc riding a Dragon. If I'm honest this was at least 45% of my
motivation for doing this project*.

You'll find the new headers in
llvm/include/llvm/ExecutionEngine/OrcJIT/*.h, and the implementation files
in lib/ExecutionEngine/OrcJIT/*.

I imagine there will be a number of questions about the design and
implementation. I've tried to preempt a few below, but please fire away
with anything I've left out.

Also, thanks to Jim Grosbach, Michael Illseman, David Blaikie, Pete Cooper,
Eric Christopher, and Louis Gerbarg for taking time out to review, discuss
and test this thing as I've worked on it.

Cheers,
Lang.

Possible questions:

(1)
Q. Are you trying to kill off MCJIT?
A. There are no plans to remove MCJIT. The new APIs are designed to live
alongside it.

(2)
Q. What do "JIT components" look like, and how do you compose them?
A. The classes and functions you'll find in OrcJIT/*.h fall into two rough
categories: Layers and Utilities. Layers are classes that implement a small
common interface that makes them easy to compose:

class SomeLayer {
private:
  // Implementation details
public:
  // Implementation details

  typedef ??? Handle;

  template <typename ModuleSet>
  Handle addModuleSet(ModuleSet&& Ms);

  void removeModuleSet(Handle H);

  uint64_t getSymbolAddress(StringRef Name, bool ExportedSymbolsOnly);

  uint64_t lookupSymbolAddressIn(Handle H, StringRef Name, bool
ExportedSymbolsOnly);
};

Layers are usually designed to sit one-on-top-of-another, with each doing
some sort of useful work before handing off to the layer below it. The
layers that are currently included in the patch are the the
CompileOnDemandLayer, which breaks up modules and redirects calls to
not-yet-compiled functions back into the JIT; the LazyEmitLayer, which
defers adding modules to the layer below until a symbol in the module is
actually requested; the IRCompilingLayer, which compiles bitcode to
objects; and the ObjectLinkingLayer, which links sets of objects in memory
using RuntimeDyld.

Utilities are everything that's not a layer. Ideally the heavy lifting is
done by the utilities. Layers just wrap certain uses-cases to make them
easy to compose.

Clients are free to use utilities directly, or compose layers, or implement
new utilities or layers.

(3)
Q. Why "addModuleSet" rather than "addModule"?
A. Allowing multiple modules to be passed around together allows layers
lower in the stack to perform interesting optimizations. E.g. direct calls
between objects that are allocated sufficiently close in memory. To add a
single Module you just add a single-element set.

(4)
Q. What happened to "finalize"?
A. In the Orc APIs, getSymbolAddress automatically finalizes as necessary
before returning addresses to the client. When you get an address back from
getSymbolAddress, that address is ready to call.

(5)
Q. What does "removeModuleSet" do?
A. It removes the modules represented by the handle from the JIT. The
meaning of this is specific to each layer, but generally speaking it means
that any memory allocated for those modules (and their corresponding
Objects, linked sections, etc) has been freed, and the symbols those
modules provided are now undefined. Calling getSymbolAddress for a symbol
that was defined in a module that has been removed is expected to return
'0'.

(5a)
Q. How are the linked sections freed? RTDyldMemoryManager doesn't have any
"free.*Section" methods.
A. Each ModuleSet gets its own RTDyldMemoryManager, and that is destroyed
when the module set is freed. The choice of RTDyldMemoryManager is up to
the client, but the standard memory managers will free the memory allocated
for the linked sections when they're destroyed.

(6)
Q. How does the CompileOnDemand layer redirect calls to the JIT?
A. It currently uses double-indirection: Function bodies are extracted into
new modules, and the body of the original function is replaced with an
indirect call to the extracted body. The pointer for the indirect call is
initialized by the JIT to point at some inline assembly which is injected
into the module, and this calls back in to the JIT to trigger compilation
of the extracted body. In the future I plan to make the redirection
strategy a parameter of the CompileOnDemand layer. Double-indirection is
the safest: It preserves function-pointer equality and works with
non-writable executable memory, however there's no reason we couldn't use
single indirection (for extra speed where pointer-equality isn't required),
or patchpoints (for clients who can allocate writable/executable memory),
or any combination of the three. My intent is that this should be up to the
client.

As a brief note: it's worth noting that the CompileOnDemand layer doesn't
handle lazy compilation itself, just lazy symbol resolution (i.e. symbols
are resolved on first call, not when compiling). If you've put the
CompileOnDemand layer on top of the LazyEmitLayer then deferring symbol
lookup automatically defers compilation. (E.g. You can remove the
LazyEmitLayer in main.cpp of the lazydemo and you'll get indirection and
callbacks, but no lazy compilation).

(7)
Q. Do the new APIs support cross-target JITing like MCJIT does?
A. Yes.

(7.a)
Q. Do the new APIs support cross-target (or cross process) lazy-jitting?
A. Not yet, but all that is required is for us to add a small amount of
runtime to the JIT'd process to call back in to the JIT via some RPC
mechanism. There are no significant barriers to implementing this that I'm
aware of.

(8)
Q. Do any of the components implement the ExecutionEngine interface?
A. None of the components do, but the MCJITReplacement class does.

(9)
Q. Does this address any of the long-standing issues with MCJIT - Stackmap
parsing? Debugging? Thread-local-storage?
A. No, but it doesn't get in the way either. These features are still on
the road-map (such as it exists) and I'm hoping that the modular nature of
Orc will us to play around with new features like this without any risk of
disturbing existing clients, and so allow us to make faster progress.

(10)
Q. Why is part X of the patch (ugly | buggy | in the wrong place) ?
A. I'm still tidying the patch up - please save patch specific feedback for
for llvm-commits, otherwise we'll get cross-talk between the threads. The
patches should be coming soon.

---

As mentioned above, I'm happy to answer further general questions about
what these APIs can do, or where I see them going. Feedback on the patch
itself should be directed to the llvm-commits list when I start posting
patches there for discussion.

* Marketing slogans abound: "Very MachO". "Some warts". "Surprisingly
friendly with ELF". "Not yet on speaking terms with DWARF".
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150114/73ce857a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lazydemo.tgz
Type: application/x-gzip
Size: 2700 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150114/73ce857a/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Orc_2015_01_13.patch
Type: application/octet-stream
Size: 133101 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150114/73ce857a/attachment.obj>