[LLVMdev] MCJIT versus getLazyBitcodeModule?

Sat Jan 25 17:39:12 PST 2014

Hi Gael, I tried converting to your approach but I had some issues making
sure that all symbols accessed by the jit modules have entries in the
dynamic symbol table.

To be specific, my current approach is to use MCJIT (using an objectcache)
to JIT the runtime module and then let MCJIT handle linking any references
from the jit'd modules; I just experimented with what I think you're doing,
and compiling my runtime and directly linking it with the rest of the
compiler, and then tying together references in the jit modules to entities
in the compiler.

I got it working for the case of "standard" functions and globals, but had
some trouble with other types of symbols.  I don't know the right
terminology for these things, but I couldn't get methods defined in headers
(ex: a no-op virtual destructor) to work properly.  I guess that's not too
hard to work around by either putting it into a cpp file or maybe with some
objcopy magic, but then I ran into the issue of string constants.  Again,
my knowledge of the terminology isn't great, but it looks like those don't
get symbols in the object file but they get their own sections, and since I
have multiple source files that I llvm-link together, the constants get
renamed in the LLVM IR and have no relation to the section names.  Maybe
there's a workaround by compiling all my runtime sources as a single file
so no renaming happens, and then some hackery to get the section names
exported, but I guess I'm feeling a little doubtful about it.

Have you tried using an ObjectCache and pre-jitting [I still have a hard
time using that term with a straight face] the runtime module?  My runtime
isn't that large (about 4kloc), but the numbers I'm getting are that it
takes about 2ms for the getLazyBitcodeModule call, and about 4ms to load
the stdlib through the ObjectCache.  I'm not sure how these numbers scale
with the size of the runtime, but it feels like if the ObjectCache loading
is too expensive then loading the bitcode might be as well?  Another idea
is that you could load+jit the bitcode the first time that you want to
inline something, since the inlining+subsequent optimizations you probably
want to do are themselves expensive and could mask the jit'ing time.

Anyway, my current plan is to stick with jit'ing the runtime module but cut
down the amount of stuff included in it, since I'm finding that most of my
runtime methods end up dispatching on type, and patchpoint-ing at runtime
seems to be more effective than inlining aot.

Kevin

On Tue, Jan 21, 2014 at 1:17 PM, Gaël Thomas <gael.thomas at lip6.fr> wrote:

> Hi Larry,
>
> Inlining from remote modules with MCJIT is not so easy, but possible
> (at least it works for me). I'm working since two days on this problem
> (from an end-user perspective, I'm not a llvm developer:)). As it can
> help you (and other people), I explain what I have done (my mail is
> maybe too long for the mailing list, sorry!).
>
> So, basically, inlining from other modules (runtime module included)
> is possible in MCJIT. The solution is maybe a little bit ugly... Just
> to explain what I do and my problems, I'm involved in the development
> of vmkit (a library to build virtual machines). I have to inline
> runtime functions defined in c++ to achieve good performance (for
> example the type checker for j3, the Java virtual machine developed
> with vmkit). I think that your problem is not so far from mine (I also
> reload my own bitcode when I start vmkit).
>
> So, I give you the picture (I can also send you my llvm pass or other
> relevant code if you need them). It can help as a starting point. I
> wrote the inling pass today, so it's maybe still buggy :).
>
> Basically, I have two kind of modules: a module that contains the
> runtime functions (defined in c++) and the other modules that contain
> functions that I have to jit compile. To simplify, let say that I have
> only one module to jit. In the jit-module, I want to call functions
> defined in the runtime-module. I have thus three problems to solve:
> * The verifier does not like when you call a function defined in the
> runtime module directly from the jit module (it prevents external
> references to other modules). So, I have to avoid this as much as
> possible.
> * The jited module has to find the llvm code of the runtime functions
> for inlining
> * When a function is not inlined, you have to provide the address of
> the function to MCJIT (I use dlsym for that purpose).
>
> What I do:
> - MCJIT only manages the jit-module (the runtime-module is not
> associated to MCJIT through addModule)
> - When I have to call a runtime function from the jit-module, I define
> an external reference to the function in the jit-module. Something
> like:
>
> llvm::Function* orig = runtimeModule->getFunction("my-function");
> llvm::Function* copy =
> (llvm::Function*)jitModule->getOrInsertFunction(orig->getName(),
> orig->getFunctionType());
>
> This step is not mandatory as you will see after (but I have not
> tested a direct use of remote references).
>
> - Then I use a llvm pass (a FunctionPass). For each function, I
> explore each of the CallSite. If the callsite goes to a function that
> does not have a definition (i.e., a runtime function), I find the
> original llvm::Function*. I use something like that:
>
>   bool FunctionInliner::runOnFunction(llvm::Function& function) {
>     bool Changed = false;
>
>     for (llvm::Function::iterator bit=function.begin();
> bit!=function.end(); bit++) {
>       llvm::BasicBlock* bb = bit;
>
>       for(llvm::BasicBlock::iterator it=bb->begin(); it!=bb->end();) {
>         llvm::Instruction *insn = it++;
>
>         if (insn->getOpcode() != llvm::Instruction::Call &&
>             insn->getOpcode() != llvm::Instruction::Invoke) {
>           continue;
>         }
>
>         llvm::CallSite  call(insn);
>         llvm::Function* callee = call.getCalledFunction();
>
>         if(!callee)
>           continue;
>
>         if(callee->isDeclaration()) { /* maybe a foreign function? */
>           llvm::Function* original =
> runtimeModule->getFunction(callee->getName());
>           if(original) {
>             /* if you use lazybitcode..., don't forget to materialize
> the original here with */
>             original->Materialize();
>
> At this step, you can directly inline your code if you want to
> systematically inline code:
>            llvm::InlineFunctionInfo ifi(0);
>            bool isInlined = llvm::InlineFunction(call, ifi, false);
>            Changed |= isInlined;
>
> Or, if you don't want to always inline the code, you can guard the
> inlining after having used the inline analysis pass:
>    llvm::InlineCostAnalysis  costAnalysis;
>    llvm::InlineCost cost = costAnalysis.getInlineCost(call, 42); /* 42
> is the threshold */
>    if(cost.isAlways()) || (!cost.isNever() && (cost))) {
>      /* inlining goes here */
>    }
>
> After this step, you have a problem. The inlined function can itself
> contain calls to the runtime functions. So, at this step, it's ugly
> because I have a function that potentially contains external
> references... What I do, I simply re-explore the code with
>     if(isInlined) {
>        it = bb->begin();
>        continue;
>     }
>
> and for each function, if its defining module is not the jitModule, a
> replace the call with a local call. Something like that:
>
>         if(callee->getParent() != function.getParent()) {
>           llvm::Function* local =
>
> (llvm::Function*)function.getParent()->getOrInsertFunction(callee->getName(),
> callee->getFunctionType());
>           callee->replaceAllUsesWith(local);
>           Changed = 1;
>         }
>
> After this step, you will have a module that only contains local
> references and that contain your prefered runtime code inlined.
>
> - Now, you have to solve the last problem, finding symbols from the
> runtimeModule when they are not inlined (global values or functions).
> In my case, I have defined my own SectionMemoryManager:
>
>   class CompilationUnit  : public llvm::SectionMemoryManager {
>     uint64_t getSymbolAddress(const std::string &Name) {
>       return (uint64_t)dlsym(SELF_HANDLE, Name.c_str() + 1);
>         /* + 1 with MacOS, + 0 with Linux */
>     }
>   }
>
> which is called by MCJIT to resolve external symbols when the jited
> module is loaded in memory (you have to use
> EngineBuilder.setMCJITMemoryManager).
>
> If, like me, you want to also inline functions from jited modules,
> it's a little bit more tricky because the llvm::Function* original =
> runtimeModule->getFunction(callee->getName()); is not enough. I have
> defined my own symbol table (a hash map) that associates function
> identifiers with a structure that contains both the original llvm
> function of the callee and its address in memory (also used in the
> SectionMemoryManager).
>
> Good luck :)
> Gaël
>
>
>
>
> 2014/1/21 Larry Gritz <lg at larrygritz.com>:
> > Thanks for the pointers.
> >
> > Am I correct in assuming that putting the precompiled bitcode into a
> second module and linking (or using the object caches) would result in
> ordinary function calls, but would not be able to inline the functions?
> >
> >         -- lg
> >
> >
> > On Jan 21, 2014, at 11:55 AM, Kaylor, Andrew <andrew.kaylor at intel.com>
> wrote:
> >
> >> I would say that the incompatibility is by design.  Not that anyone
> specifically wanted the incompatibility, but rather it's a known artifact
> of the MCJIT design.
> >>
> >> You can find an example of MCJIT's object caching here:
> http://blog.llvm.org/2013/08/object-caching-with-kaleidoscope.html
> >>
> >> The two blog entries before that may also be of use to you:
> http://blog.llvm.org/2013_07_01_archive.html
> >>
> >> I don't where you can find an example of the Module linking I
> described, but I think llvm::Linker is the class to look at.
> >>
> >> -Andy
> >>
> >
> > --
> > Larry Gritz
> > lg at larrygritz.com
> >
> >
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
> --
> -------------------------------------------------------------------
> Gaël Thomas, Associate Professor, UPMC
> http://pagesperso-systeme.lip6.fr/Gael.Thomas/
> -------------------------------------------------------------------
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140125/5b1394e7/attachment.html>