Hi Daniel,<br><br>First of all, thanks for taking time. :)<br><br>I like your idea of a FOOJIT object file format.<br><br>How do you expect to handle mappings (addGlobalMapping - GlobalValue* foo is at native address 0xB4F...) ?<br>
<br>Olivier.<br><br><br><br><div class="gmail_quote">On Mon, Nov 15, 2010 at 7:15 PM, Daniel Dunbar <span dir="ltr"><<a href="mailto:daniel@zuster.org">daniel@zuster.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
Hi all,<br>
<br>
As promised, here is the rough design of the upcoming MC-JIT*.<br>
Feedback appreciated!<br>
<br>
(*) To be clear, we are only calling it the MC-JIT until we have<br>
finished killing the old one. When I say JIT below, I mean the MC-JIT.<br>
I basically am ignoring completely the existing JIT. I will keep<br>
things API compatible whenever possible, of course.<br>
<br>
I see two main design directions for the JIT:<br>
<br>
--<br>
<br>
#1 (aka MCJIT) - We make a new MCJITStreamer which communicates with<br>
the JIT engine to arrange to plop code in the right place and update<br>
various state information.<br>
<br>
This is the most obvious approach, is roughly similar to the way the<br>
existing JIT works, and this is the way the proposed MC-JIT patches<br>
work (see MCJITState object).<br>
<br>
It also happens to not be the approach I want to take. :)<br>
<br>
<br>
#2 (aka FOOJIT) - MC grows a new "pure" backend, which is designed<br>
around representing everything that "can be run" on a target platform.<br>
This is very connected to the inherent capabilities of the hardware /<br>
OS, and is usually a superset** of what the native object format<br>
(Mach-O, ELF, COFF) can represent.<br>
<br>
The "pure" backend defines a hard (but non-stable) object file format<br>
which is more or less a direct encoding of the native MC APIs (it is<br>
not stable, so it can directly encode things like FixupKind enum<br>
values).<br>
<br>
I don't have a name for this format, so for now I will call it FOO.<br>
<br>
The "MC-JIT" then becomes something more like a "FOO-JIT". It is<br>
architected as a consumer of "FOO" object files over time. The basic<br>
architecture is quite simple:<br>
(a) Load a module, emit it as a "FOO" object.<br>
(b) Load the object into a worklist, scan for undefined symbols,<br>
dynamically emit more "FOO" modules.<br>
(c) Iterate until no undefined symbols remain.<br>
(d) Execute code -- if we hit a lazy compilation callback, go back to (a).<br>
<br>
(**) It more or less *must* be a superset, since object formats<br>
usually don't bother to represent things which can't be run. Features<br>
which require OS emulation is an obvious exception. As concrete<br>
example, consider the implementation of thread local storage. Each<br>
platform typically will chose an implementation approach and limit its<br>
format to supporting that, but the hardware itself supports many more<br>
implementation approaches.<br>
<br>
--<br>
<br>
I apologize if my description is a bit terse, but I hope the basic<br>
infrastructure comes through. I will make some pretty diagrams for it<br>
at some point (hopefully before the next dev mtg, hahaha hmmm....).<br>
<br>
Here are the reasons I want to follow approach #2:<br>
<br>
1. It makes the JIT process look much more like the standard<br>
compilation process. In fact, from the FOOJIT's perspective, it could<br>
even run the compiler out of process to produce "FOO" object files,<br>
with no real change in behavior.<br>
<br>
This has two main implications:<br>
a. We are leveraging much more of the existing infrastructure.<br>
b. We can use more of the existing tools to test and debug the JIT.<br>
<br>
2. It forces us to treat the JIT as a separate "subtarget".<br>
a. In reality, this is already true. The compiler needs to know it is<br>
targeting a JIT in terms of what features are available (indirect<br>
stubs? exception tables? thread local storage?), but the current<br>
design papers over this. This design forces us to acknowledge that<br>
fact up front, and should make the architecture more understandable.<br>
<br>
3. It eases testing and debugging.<br>
a. We can build new tools to test the FOOJIT, for example, a tool<br>
that just loads a couple FOO object files and runs them, but without<br>
needing to do codegen. Since we can already use the existing tools to<br>
work with the FOO objects, this basically gives us a new testing entry<br>
point into the JIT.<br>
<br>
--<br>
<br>
Some caveats of this design:<br>
<br>
1. The initial implementation will probably work very much as<br>
described, it will actually write "FOO" object files to memory and<br>
load them.<br>
<br>
In practice, we would like to avoid the performance overhead of this<br>
copy. My plan here is that eventually we would have multiple<br>
implementations of the FOO object writer, one of which would write to<br>
the serialized form, but another would splat directly into the process<br>
memory.<br>
<br>
We would allow other fancy things following the same approach, for<br>
example allow the JIT to pin symbols to their actual addresses, so<br>
that the assembler can do the optimal relaxation for where the code is<br>
actually landing in memory.<br>
<br>
2. It requires some more up front work, in that there is more stuff to<br>
build. However, I feel it is a much stronger design, so I expect this<br>
to pay off relatively quickly.<br>
<br>
3. Some JIT-tricks become a bit less obvious. For example, in a JIT,<br>
it is natural when seeing a symbol undefined "bar" to go ahead and see<br>
if you can find "bar" and immediately generated code for it. You can't<br>
do that in the FOOJIT model, because you won't know "bar" is undefined<br>
until you read the object back.<br>
<br>
However, in practice one needs to be careful about recursion and<br>
reentrancy, so you have to take care when trying to do things like<br>
this. The FOOJIT forces such tricks to go through a proper API (the<br>
FOO object) which I end up seeing as a feature, not a bug.<br>
--<br>
<br>
And, a final word on API compatibility:<br>
<br>
As mentioned before, I have no *plan* to break any existing public<br>
interface to the JIT. The goal is that we eventually have a strict<br>
superset of the current functionality.<br>
<br>
The actual plan will be to roll out the FOOJIT in tree with some<br>
option to allow clients to easily pick the implementation. A tentative<br>
goal would be to have the FOOJIT working well enough in 2.9 so that<br>
clients can test it against the released LLVM, and that for 2.10<br>
(*grin*) we can make it the default.<br>
<br>
Thoughts?<br>
<br>
- Daniel<br>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
</blockquote></div><br>