[LLVMdev] MC-JIT Design

Mon Nov 15 13:34:47 PST 2010

On Mon, Nov 15, 2010 at 12:53 PM, Jan Sjodin <jan_sjodin at yahoo.com> wrote:
> What kind of restrictions will the existing object file formats impose
> on the JIT? I don't know enough about the JIT and object file format
> interaction to know if this will be an issue. It seems clear that it would
> be worse to try to encode "extra things" in some obscure way than to create
> the FOO format initially. If FOO is truly a superset of everything this
> could even be the generic object file format that Michael Spencer was thinking
> about creating. Perhaps this is going the wrong direction since you wanted
> something less stable and directly tied to the MC infrastructure. How would
> introducing the FOO format later on work?

The idea here is that our JIT (which is essentially a runtime linker)
is perfectly capable of being written so that it can link object files
from distinct formats.

Linking Mach-O and COFF or ELF files might be hard, but we could
probably make it easy to link any native format and a FOO format file
without too much trouble.

The idea is that then we could have the JIT machinery use the native
format when it needs to interface with external runtime interfaces on
the platform, and it could use the FOO format when targeting code
which is purely target specific.

We sort of end up in the same place as what I originally proposed, but
this way probably lets us get something which (a) works and (b) has
nice features (like debugging support) sooner, and then introducing
the FOO-infrastructure becomes more of a quality-of-implementation
issue as to how fast we JIT, how many fancy JIT tricks we support.

 - Daniel

>
>
> - Jan
>
>
>
>
> ----- Original Message ----
>> From: Daniel Dunbar <daniel at zuster.org>
>> To: LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
>> Sent: Mon, November 15, 2010 2:34:22 PM
>> Subject: Re: [LLVMdev] MC-JIT Design
>>
>> Quick follow up here:
>>
>> I talked to Eric for a bit about this proposal, and  he convinced me
>> that we should take a slightly different tack. I'll write a  bit more
>> about it later.
>>
>> The idea Eric convinced me was better is to  not invent a FOO format,
>> but just use the native platform object format and  focus on writing
>> essentially a runtime linker for that platforms object  files.
>>
>> This has various pros and cons, but I hadn't given it enough  weight before.
>>
>> The pros and cons we discussed:
>>
>> Pro:
>>  - We reuse  all existing MC output functionality.
>>  - We have a shorter path to working  well with the external system
>> tools (the real runtime linker, the debugger,  the unwinder).
>>
>> Cons:
>>  - Requires developing good object file libraries  for LLVM. This is
>> also a pro, as it coalesces work other people (Michael  Spencer, Nick
>> Kledzik) are already interested in doing.
>>  - Means JIT is  slightly more platform dependent, as the runtime
>> linker could have ELF or  Mach-O specific bugs that wouldn't show up on
>> another platform.
>>  - Doesn't  acknowledge that the JIT is a separate target. Constraints
>> the code generator  to only doing what is actually supported on the
>> platform.
>>
>> The last con  is the main thing I wanted to not preclude in a new JIT
>> design, but Eric  convinced me that if we start by using the native
>> format, we can always  introduce my new FOO format transparently if we
>> realize there is a concrete  need for it.
>>
>> I'll try and sketch up some more of what this design would  look like
>> this evening...
>>
>>  - Daniel
>>
>> On Mon, Nov 15, 2010 at  10:15 AM, Daniel Dunbar <daniel at zuster.org> wrote:
>> > Hi  all,
>> >
>> > As promised, here is the rough design of the upcoming  MC-JIT*.
>> > Feedback appreciated!
>> >
>> > (*) To be clear, we are  only calling it the MC-JIT until we have
>> > finished killing the old one.  When I say JIT below, I mean the MC-JIT.
>> > I basically am ignoring  completely the existing JIT. I will keep
>> > things API compatible whenever  possible, of course.
>> >
>> > I see two main design directions for the  JIT:
>> >
>> > --
>> >
>> > #1 (aka MCJIT) - We make a new  MCJITStreamer which communicates with
>> > the JIT engine to arrange to plop  code in the right place and update
>> > various state  information.
>> >
>> > This is the most obvious approach, is roughly  similar to the way the
>> > existing JIT works, and this is the way the  proposed MC-JIT patches
>> > work (see MCJITState object).
>> >
>> > It  also happens to not be the approach I want to take. :)
>> >
>> >
>> >  #2 (aka FOOJIT) - MC grows a new "pure" backend, which is designed
>> >  around representing everything that "can be run" on a target platform.
>> >  This is very connected to the inherent capabilities of the hardware /
>> >  OS, and is usually a superset** of what the native object format
>> >  (Mach-O, ELF, COFF) can represent.
>> >
>> > The "pure" backend defines a  hard (but non-stable) object file format
>> > which is more or less a direct  encoding of the native MC APIs (it is
>> > not stable, so it can directly  encode things like FixupKind enum
>> > values).
>> >
>> > I don't have  a name for this format, so for now I will call it FOO.
>> >
>> > The  "MC-JIT" then becomes something more like a "FOO-JIT". It is
>> > architected  as a consumer of "FOO" object files over time. The basic
>> > architecture is  quite simple:
>> >  (a) Load a module, emit it as a "FOO" object.
>> >   (b) Load the object into a worklist, scan for undefined symbols,
>> >  dynamically emit more "FOO" modules.
>> >  (c) Iterate until no undefined  symbols remain.
>> >  (d) Execute code -- if we hit a lazy compilation  callback, go back to (a).
>> >
>> > (**) It more or less *must* be a  superset, since object formats
>> > usually don't bother to represent things  which can't be run. Features
>> > which require OS emulation is an obvious  exception. As concrete
>> > example, consider the implementation of thread  local storage. Each
>> > platform typically will chose an implementation  approach and limit its
>> > format to supporting that, but the hardware  itself supports many more
>> > implementation approaches.
>> >
>> >  --
>> >
>> > I apologize if my description is a bit terse, but I hope the  basic
>> > infrastructure comes through. I will make some pretty diagrams for  it
>> > at some point (hopefully before the next dev mtg, hahaha  hmmm....).
>> >
>> > Here are the reasons I want to follow approach  #2:
>> >
>> > 1. It makes the JIT process look much more like the  standard
>> > compilation process. In fact, from the FOOJIT's perspective, it  could
>> > even run the compiler out of process to produce "FOO" object  files,
>> > with no real change in behavior.
>> >
>> > This has two  main implications:
>> >  a. We are leveraging much more of the existing  infrastructure.
>> >  b. We can use more of the existing tools to test and  debug the JIT.
>> >
>> > 2. It forces us to treat the JIT as a separate  "subtarget".
>> >  a. In reality, this is already true. The compiler needs to  know it is
>> > targeting a JIT in terms of what features are available  (indirect
>> > stubs? exception tables? thread local storage?), but the  current
>> > design papers over this. This design forces us to acknowledge  that
>> > fact up front, and should make the architecture more  understandable.
>> >
>> > 3. It eases testing and debugging.
>> >  a.  We can build new tools to test the FOOJIT, for example, a tool
>> > that just  loads a couple FOO object files and runs them, but without
>> > needing to do  codegen. Since we can already use the existing tools to
>> > work with the  FOO objects, this basically gives us a new testing entry
>> > point into the  JIT.
>> >
>> > --
>> >
>> > Some caveats of this  design:
>> >
>> > 1. The initial implementation will probably work very  much as
>> > described, it will actually write "FOO" object files to memory  and
>> > load them.
>> >
>> > In practice, we would like to avoid the  performance overhead of this
>> > copy. My plan here is that eventually we  would have multiple
>> > implementations of the FOO object writer, one of  which would write to
>> > the serialized form, but another would splat  directly into the process
>> > memory.
>> >
>> > We would allow other  fancy things following the same approach, for
>> > example allow the JIT to  pin symbols to their actual addresses, so
>> > that the assembler can do the  optimal relaxation for where the code is
>> > actually landing in  memory.
>> >
>> > 2. It requires some more up front work, in that there is  more stuff to
>> > build. However, I feel it is a much stronger design, so I  expect this
>> > to pay off relatively quickly.
>> >
>> > 3. Some  JIT-tricks become a bit less obvious. For example, in a JIT,
>> > it is  natural when seeing a symbol undefined "bar" to go ahead and see
>> > if you  can find "bar" and immediately generated code for it. You can't
>> > do that  in the FOOJIT model, because you won't know "bar" is undefined
>> > until you  read the object back.
>> >
>> > However, in practice one needs to be  careful about recursion and
>> > reentrancy, so you have to take care when  trying to do things like
>> > this. The FOOJIT forces such tricks to go  through a proper API (the
>> > FOO object) which I end up seeing as a  feature, not a bug.
>> > --
>> >
>> > And, a final word on API  compatibility:
>> >
>> > As mentioned before, I have no *plan* to break  any existing public
>> > interface to the JIT. The goal is that we eventually  have a strict
>> > superset of the current functionality.
>> >
>> > The  actual plan will be to roll out the FOOJIT in tree with some
>> > option to  allow clients to easily pick the implementation. A tentative
>> > goal would  be to have the FOOJIT working well enough in 2.9 so that
>> > clients can  test it against the released LLVM, and that for 2.10
>> > (*grin*) we can  make it the default.
>> >
>> > Thoughts?
>> >
>> >  -  Daniel
>> >
>>
>> _______________________________________________
>> LLVM  Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>