[PATCH][llvm-c] Expose MC JIT

Filip Pizlo fpizlo at apple.com
Sun Apr 28 15:49:04 PDT 2013


On Apr 22, 2013, at 4:36 PM, "Kaylor, Andrew" <andrew.kaylor at intel.com> wrote:

> Help is always welcome!
>  
> You’re definitely right that it needs to be more fault-tolerant.  We’ve got a few places where we throw fatal errors when things don’t go as we’d like them to.  That makes sense for a compiler, but not so much for a JIT engine.
>  
> We’ve tried to keep the separation of responsibilities between RuntimeDyld, MCJIT and the memory manager clean, and I think that we’ve been successful.  We can talk about the specifics of what you want to manage, but I’d like to think that the memory manager is the only part where you’d need or want to provide a custom implementation.

I suspect this is true.

>  
> I’m not sure what you mean when you refer to “cache management” but I’m working on a patch right now that provides a mechanism for caching generated object images so that future invocations can by-pass the code generation stage.

I was referring to the magic needed to flush icaches.

>  
> The big items that I see needing to be done in MCJIT are:
>  
> ·         Support for multiple modules

I'd be happy to help with this, since I'll probably want to use it.

> ·         Some kind of support for lazy compilation

Interestingly, I won't benefit from this at all.  I'm curious what scenarios this is intended for.

> ·         Cleanup of module state management
> ·         Separation of JIT interfaces from MCJIT interfaces
>  
> The lazy compilation solution will almost certainly need to be coupled with multiple module support.  Because MCJIT emits object images rather than free-standing functions, we need a scheme that will let clients create a large number of small modules and have MCJIT figure out when each module needs to be compiled.  Andrew Sorensen did some exploratory work on this, but more work is needed.

On the other hand, it's also important to make it easy for clients to get around the laziness entirely.

>  
> The module state management is also closely related to multiple module support.  In fact, I think better module state management is probably a pre-requisite to proper multiple module support.  I haven’t thought this all the way through in any systematic way, but in general I think MCJIT needs to be more systematic about tracking what state its module is in and enforcing what actions can and should be applied to it in any given state.  For instance, a module starts in an un-compiled state.  From there nothing can happen until we have an object image for the module (which can happen either by code generation or loading from cache).  Once we have an object image, relocations can be applied one or more times, sections can be re-mapped and page permissions can be applied.  Once page permissions have been applied we can’t change things anymore, unless we provide some mechanism to get back out of that state.  The biggest effort here is probably the mental work of mapping out how things should work.

Do we have a clear-cut set of usage scenarios for the MCJIT?  That might help me understand why caching and laziness are a good thing.  They are not a good thing for what I'm doing, and probably wouldn't be a good thing for most other multi-tier VMs.

>  
> The separation of JIT and MCJIT interfaces is mostly independent of everything else, but as with any detangling project the sooner it happens the better.
>  
> A less general piece of work that probably interests you is that the RuntimeDyldMachO implementation is a little weak right now.  In particular, it doesn’t handle the sort of relocations that pop-up if generated code tries to call an external function.  LLDB handles this by resolving all function calls to absolute addresses before it invokes the code generation.  I’m hoping to fix that soon, but I don’t know what else might be lurking behind it.  In general, the RuntimeDyld implementations have grown on an as-needed basis.  If we were handling all the relocation types that were actually being encountered, we considered that good enough.  Unfortunately, that means it’s pretty sensitive to code model and relocation model changes.

It also appears that either RuntimeDyldMachO in particular, or the whole MCJIT infrastructure in general, gets into a confused state for constants that the instruction selector will put into a constant pool, partly because X86MCTargetDesc.cpp puts us into CodeModel::Large and then things sort of go down hill from there.  I haven't worked out where the bug is other than forcing CodeModel::Small "fixes" the problem.  It's interesting that CodeModel::Small is quite appropriate for WebKit: our JIT memory manager will ensure that all code is within a pre-reserved 1GB slab of virtual memory on X86.

Do you have thoughts on whether the CodeModel enum is the right kind of abstraction for the JIT?  It appears that on 64-bit, our use of CodeModel::Large forces *all* of the backend's reasoning about addresses to assume the worst case even though it's just for external functions that we need this, as per the comment in X86MCTargetDesc.

More broadly, I'm of the mind that LLDB's way of handling this is the correct and canonical way to resolve functions in a JIT.  You benefit from telling the compiler about the addresses of things early.  This is a unique property that JITs can, and should, take advantage of.  But probably there is some usage scenario of the MCJIT that I'm missing - so maybe it would be great if you could explain the motivation for having things like external function resolution baked into the JIT.

>  
> Let me know if that answers your questions or if you’d like to know any more about the motivation for the above.
>  
> -Andy
>  
> From: Filip Pizlo [mailto:fpizlo at apple.com] 
> Sent: Monday, April 22, 2013 2:03 PM
> To: Kaylor, Andrew
> Cc: David Tweed; Eric Christopher; llvm-commits at cs.uiuc.edu
> Subject: Re: [PATCH][llvm-c] Expose MC JIT
>  
> OK - I think that calling finalizeObject() within the C API is a reasonable kludge for now.  I'll have a patch shortly.
>  
> More broadly, getting MCJIT into a better shape is on my critical path, as well.  See: https://bugs.webkit.org/show_bug.cgi?id=112840.  I'd like to help out with the de-sloppiness-ifying of it, so I'd like to understand better what the end goal is.  I already know that I want to control memory management, cache management, and page permissions myself as a client, and preferably I want to accomplish this by supplying my own RuntimeDyld (or JITMemoryManager).  I also know that I'll want to improve error tolerance - for example in case of memory allocation failure (honestly, I haven't investigated this much, so this may already be supported - I just don't see how through the API).  So, I'm particularly interesting in seeing how to get the MCJIT into a great shape while supporting both the currently-envisioned use cases in LLVM and its other clients, and also the way that WebKit will use it!  This should be fun. :-)
>  
> -Filip
>  
>  
> On Apr 22, 2013, at 11:51 AM, "Kaylor, Andrew" <andrew.kaylor at intel.com> wrote:
> 
> 
> I wasn’t sure the C API needed to support the case of JITing into another address space.  I guess it makes sense that we’d want to make that available eventually.
>  
> I do think it would be better to have MCJIT finalize the object from getPointerToFunction(), but as I said we can’t do that until there’s another way to trigger code generation.  This is part of the sloppiness I referred to earlier.  There is a web of interactions going on within MCJIT and there are some constraints on the order in which things can be done.  Right now MCJIT has a patchwork way of handling this, but it really needs to be cleaned up to manage things in a more intentional way.
>  
> Anyway, my immediate goal here is to figure out the simplest reasonable thing we can do to get your patch working in a sustainable way.
>  
> I don’t know if anything other than lli is relying on the current side-effects of MCJIT::getPointerToFunction().  I’d prefer to hold off on putting new things in the MCJIT interface until the aforementioned clean-up is done.  I was suggesting putting the call to finalizeObject in the C API because I’d rather rush something into implementation than rush something into an interface, but depending on your timeline for making the C API I could try to bump the priority of the MCJIT state handling clean-up on my end.
>  
> -Andy
>  
> From: Filip Pizlo [mailto:fpizlo at apple.com] 
> Sent: Monday, April 22, 2013 11:25 AM
> To: Kaylor, Andrew
> Cc: David Tweed; Eric Christopher; llvm-commits at cs.uiuc.edu
> Subject: Re: [PATCH][llvm-c] Expose MC JIT
>  
> Wouldn't it be better to just have MCJIT call finalizeObject in getPointerToFunction and friends, and eradicate ExecutionEngine::finalizeObject() completely?
>  
> OTOH, if I understand right and the goal of the MCJIT is to support the notion of JITing into someone else's address space, then finalizeObject() makes loads of sense: you want the client to be able to say when it happens. If that's true then it should be exposed to the C API. 
> 
> -Filip
> 
> On Apr 22, 2013, at 10:57 AM, "Kaylor, Andrew" <andrew.kaylor at intel.com> wrote:
> 
> It definitely won’t be necessary to expose more things through the C API.  At most, things will need to be added to the implementation.
>  
> I’m wondering if we can even avoid having to put the ‘finalize’ concept into the API.
>  
> As I understand it, the normal work flow would be something like this:
>  
> 1.      Create a module and populate it
> 2.      Create an execution engine for the module
> 3.      Get a pointer to a function in the module
> 4.      Execute the function
>  
> If that’s right, I guess MCJIT trips because while its implementation of getPointerTo[Named]Function triggers compilation it doesn’t cause permissions to be applied or invalidate the code cache.  This happens because MCJIT needs to handle the case where the generated code is going to be executed in another process (and possibly on another system), so it doesn’t make assumptions about when everything is in its final place.  The C API, however, could arguably make such assumptions.
>  
> What this boils down to is that somewhere between step 2 above and step 4 above we need to:
>  
> 1.      Generate the code
> 2.      Apply relocations
> 3.      Apply memory permissions
> 4.      Invalidate the code cache
>  
> MCJIT does 1 and 2, if necessary, in response to a getPointerToFunction() call.  Arguably it could also do 3 and 4 there, since in the remote case the client code isn’t going to want pointers to functions.  The trouble is that there are places (such as lli) where we are using that call to trigger code emission even though we may still want to move things around before 3 and 4 happen.  That wouldn’t be a problem if we exposed a function to trigger code emission directly, but we don’t right now.
>  
> However, it may be reasonable for the C API implementation to call MCJIT::finalizeObject when its getPointerToFunction equivalent is called.
>  
> That leaves us with invalidating the code cache.  I don’t see any reason that the memory manager shouldn’t do that automatically when the applyPermissions function is called.  I notice that currently invalidateInstructionCache is part of the SectionMemoryManager interface but not the RuntimeDyldMemoryManager, so that’s a problem already if we don’t just make it part of applyPermissions().
>  
> So what I’m thinking is that if you can add a call to MCJIT::finalizeObject in the appropriate places in the C API implementation then the FinalizeAllObjects method can be removed completely.  I’ll add an invalidateInstructionCache() call in the SectionMemoryManager::applyPermissions() implementation, and that should take care of the ARM issue.
>  
> Sounds good. 
> 
> 
> 
>  
> Does that sound reasonable?
>  
> -Andy
>  
>  
> From: Filip Pizlo [mailto:fpizlo at apple.com] 
> Sent: Monday, April 22, 2013 10:19 AM
> To: Kaylor, Andrew
> Cc: David Tweed; Eric Christopher; llvm-commits at cs.uiuc.edu
> Subject: Re: [PATCH][llvm-c] Expose MC JIT
>  
>  
> On Apr 22, 2013, at 10:06 AM, "Kaylor, Andrew" <andrew.kaylor at intel.com> wrote:
> 
> 
> 
> 
> The state management in MCJIT is quite sloppy right now.  I agree that invalidating the code cache is an issue that needs to be considered.  It seems to me that MCJIT itself ought to be able to do that when it needs to be done if it were paying attention.  At the very least, it could happen in the 'finalizeObject' method.  That's somewhat tangential to the patch at hand, but it is a consideration.  If there's something quick I can do to MCJIT to make this work, that's probably preferable to pushing something into the C-interface implementation.  I'll give that some thought today, but if anyone else is interested I'd be happy to make it a discussion rather than just a private rumination.
>  
> Hopefully we can do this without exposing more stuff via the C API.  I think that finalizeObject() should do this, but I will think about it some more.
> 
> 
> 
> 
> 
> Otherwise, my main question about the patch has to do with the nature of the C-interface API.  Is that API treated as a contract that needs to be respected from release to release or are we free to tinker with it as needed?  The thing that worries me is how this interface will survive the transition to multiple module support.
>  
> I believe that this is the goal of the C API, yes.  It is also a goal of this patch to be forward-compatible in this way.
>  
> My patch defends against this in two ways:
>  
> 1) When creating the MCJIT via the C API, I just follow the same convention as other ExecutionEngines do: you specify a module, but you can add one later.  Right now calling AddModule on an MCJIT instance will crash, and the documentation tells you this.  Once the MCJIT supports multiple modules, I believe that this should Just Work - you will then be able to call AddModule.
>  
> 2) I don't expose finalizeObject() directly.  Instead I created a new API called FinalizeAllObjects(), which requires that all modules associated with the execution engine get finalized at the time of call.  My understanding is that it is safe to finalizeObject() if you've already done it before - it's currently idempotent.  So if MCJIT goes multi-module, then this API will still have a well-defined behavior, and this behavior will not be different from what my patch does.
>  
> -Filip
>  
> 
> 
> 
> 
> 
> -Andy
> 
> -----Original Message-----
> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of David Tweed
> Sent: Monday, April 22, 2013 7:33 AM
> To: 'Eric Christopher'; Filip Pizlo
> Cc: llvm-commits at cs.uiuc.edu
> Subject: RE: [PATCH][llvm-c] Expose MC JIT
> 
> Hi,
> 
> Not a comment on the general idea (which seems like a good one), but asking detail question: some architectures (such as ARM) invalidating cache entries at more times than, eg, x86. (IIRC ARM needs to invalidate the cache at the transition time between being "data" and "instructions".) It looks to me like this can be done in LLVMFinalizeAllObjects(), but I'm cc:ing someone much, much more knowledgeable about this than me... It's certainly desirable that the API provides enough points at which the MC JIT is in command that it can call all the (possibly platform specific) permissions/cache invalidation operations needed on that memory without user code needing to do it.
> 
> Other than that, the patch looks like a good patch to me (but again Andrew is the main authority).
> 
> Cheers,
> Dave
> 
> 
> -----Original Message-----
> From: llvm-commits-bounces at cs.uiuc.edu
> [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Eric Christopher
> Sent: 22 April 2013 14:22
> To: Filip Pizlo
> Cc: llvm-commits at cs.uiuc.edu
> Subject: Re: [PATCH][llvm-c] Expose MC JIT
> 
> The API and rationale seem totally reasonable to me. Up to Andy to approve though.
> 
> -eric
> 
> On Sun, Apr 21, 2013 at 7:14 PM, Filip Pizlo <fpizlo at apple.com> wrote:
> 
> 
> 
> OK - I have a new patch for review, which incorporates Andrew's feedback.
> 
> The patch exposes the MCJIT via the C API.
> 
> The current API uses the SectionMemoryManager by default.  I plan to
> expose
> 
> 
> 
> the ability to have the C API call back into the client's code, and 
> use
> the
> 
> 
> 
> client's own memory manager, in the future.  But even then, the 
> default
> will
> 
> 
> 
> be SectionMemoryManager.  Because this requires calling
> applyPermissions(),
> 
> 
> 
> I also expose the ExecutionEngine::finalizeObject() method.  This was
> tricky
> 
> 
> 
> - I take it that in the future, this method will take a Module*M 
> parameter to specify which module to finalize.  In order to not create 
> future confusion in the C API, I expose this as 
> LLVMFinalizeAllObjects() and specify the API's semantics as being that 
> all objects associated with the execution engine should be finalized.
> 
> The patch also exposes the NoFramePointerElim option.  The manner in 
> which options are exposed is designed for forward compatibility; you 
> supply an options struct along with a size which you zero-fill prior 
> to
> manipulating.
> 
> 
> 
> This is similar to the idiom I've seen used in other C APIs like
> BerkeleyDB.
> 
> 
> 
> I considered having separate C function calls for each option, in the
> style
> 
> 
> 
> of the ExecutionEngineBuilder API - but while that idiom feels right 
> to me in C++, it feels less C-like.  As well, the current options 
> approach
> exposes
> 
> 
> 
> not just parts of the Builder but also part of TargetOptions (namely, 
> NoFramePointerElim).  It's also more concise in practice.
> 
> I plan to expose more innards through the LLVMMCJITCompilerOptions in 
> the future.  I'd be happy to do more of that in one go if that was 
> preferred; but I thought that a baby step would be the best thing for now.
> 
> 
> 
> -Filip
> 
> 
> On Apr 21, 2013, at 6:26 PM, Filip Pizlo <fpizlo at apple.com> wrote:
> 
> 
> On Apr 15, 2013, at 10:42 AM, "Kaylor, Andrew" 
> <andrew.kaylor at intel.com>
> wrote:
> 
> OK, let me start by saying that MCJIT does take ownership of the 
> memory manager.  It doesn't use an OwningPtr, which would make this 
> clear, but it does delete the pointer in its destructor.  I think this 
> is happening because we needed some finer control over when the MM got 
> deleted.  I
> should
> 
> 
> 
> probably revisit this and at least add some comments to make it clear
> what's
> 
> 
> 
> happening and why.  It might not even be an issue anymore, because I 
> did some work a while ago to try to clean up object ownership issues.
> 
> 
> Actually, I was just confused.  MCJIT deletes MemMgr, which is always 
> aliased to Dyld, as far as I can tell.  So I was just wrong. :-)
> 
> 
> That said, I have been meaning for some time to break apart the JIT 
> and MCJIT interfaces.  The fact that they are both abstracted by
> ExecutionEngine
> 
> 
> 
> and EngineBuilder complicates that, but it really needs to be done (as 
> you are seeing).
> 
> For now, would it be possible to have the C-interface provide a 
> wrapper
> that
> 
> 
> 
> supplies empty implementations of the irrelevant functions when 
> creating a memory manager for MCJIT?
> 
> 
> I think this is sensible.  I will proceed in this way.
> 
> Thanks!
> 
> 
> -Andy
> 
> From: Filip Pizlo [mailto:fpizlo at apple.com]
> Sent: Saturday, April 13, 2013 1:18 AM
> To: Kaylor, Andrew
> Cc: llvm-commits at cs.uiuc.edu
> Subject: Re: [PATCH][llvm-c] Expose MC JIT
> 
> Ah - good thing you pointed this out.  I just realized that my patch 
> is wrong.  Perhaps I can get some feedback on the best way to architect this.
> 
> Here's the problem:
> 
> - MCJIT does not take ownership of the memory manager.  Hence 
> allocating
> one
> 
> 
> 
> in the constructor is wrong; it'll leak when MCJIT dies.  But deleting 
> the memory manager passed to MCJIT would be a change in behavior, and 
> I'm not sure if it's in line with either what existing users expect or 
> what was intended.  Insofar as the JIT instance corresponds to 
> ownership of
> modules,
> 
> 
> 
> it feels like it shouldn't also take ownership of the memory manager; 
> for example you might imagine wanting to throw away the MCJIT but keep 
> the
> code
> 
> 
> 
> it generated and continue to use the memory manager to track it - and 
> eventually free it.  But EngineBuilder currently claims that the 
> ExecutionEngine takes ownership of the JMM - I'm assuming that this is
> just
> 
> 
> 
> wrong documentation, and that EngineBuilder's use of the same JMM 
> option
> for
> 
> 
> 
> both JIT and MCJIT is just not right.
> 
> - I'd like to expose SectionMemoryManager and, eventually in a 
> separate patch, the ability to create custom RTDyldMemoryManagers via the C API.
> I'd
> 
> 
> 
> prefer this to be an RTDyldMemoryManager and not a JITMemoryManager, 
> since the latter has a load of methods that are not relevant to MCJIT.  
> But EngineBuilder wants a JITMemoryManager.  This would mean that the 
> C API would have to pass its RTDyldMemoryManager via a cast to 
> JITMemoryManager just so MCJIT could then use it as an 
> RTDyldMemoryManager again.  Seems wrong.  I'm assuming that the 
> correct long-term thing is to fix the EngineBuilder to not pass the 
> JMM to the MCJIT, since it's good to expose the fact that the MCJIT 
> actually just wants an RTDyldMemoryManager
> instead.
> 
> 
> 
> 
> In short, I'd like to have a separate EngineBuilder setting for the 
> RTDyldMemoryManager.  If this is specified and you end up using the 
> JIT
> and
> 
> 
> 
> not MCJIT, you get an error.  If you use the MCJIT, then the 
> RTDyldMemoryManager option overrides the JMM option.  Or something
> similar.
> 
> 
> 
> 
> Does that make sense?
> 
> -Filip
> 
> 
> On Apr 12, 2013, at 5:38 PM, Filip Pizlo <fpizlo at apple.com> wrote:
> 
> 
> Thanks for the feedback!  I will try this change and see what happens.
> 
> -Filip
> 
> 
> On Apr 12, 2013, at 5:35 PM, "Kaylor, Andrew" 
> <andrew.kaylor at intel.com>
> wrote:
> 
> Hi Filip,
> 
> I'll take a closer look at your patches on Monday, but my initial 
> input is that the default memory manager used should be 
> SectionMemoryManager rather than the DefaultJITMemoryManager.
> 
> Thanks,
> Andy
> 
> From: llvm-commits-bounces at cs.uiuc.edu 
> [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Filip Pizlo
> Sent: Friday, April 12, 2013 4:49 PM
> To: llvm-commits at cs.uiuc.edu
> Subject: Re: [PATCH][llvm-c] Expose MC JIT
> 
> Revised patches included.
> 
> I added additional ruggedizing to the LLVMCreateMCJITCompilerForModule 
> function, so that if it detects that the passed struct is larger than 
> expected, it reports an error instead of continuing.
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> 
> 
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130428/76726315/attachment.html>


More information about the llvm-commits mailing list