[LLVMdev] Some MCJIT benchmark numbers

Tue Nov 19 03:53:52 PST 2013

The pass manager is re-created in emitObject on every call.

Andy, is that needed or can we create the PM in MCJIT constructor and keep
it around?

Yaron

2013/11/19 Kevin Modzelewski <kmod at dropbox.com>

> So I finally took the plunge and switched to MCJIT (wasn't too bad, as
> long as you remember to call InitializeNativeTargetDisassembler if you want
> disassembly...), and I got the functionality to a point I was happy with so
> I wanted to test perf of the system.  I created a simple benchmark and I'd
> thought I'd share the results, both because I know I personally had no idea
> what the results would be, and because it seems like there's some
> low-hanging fruit to improve performance.
>
> My JIT is currently structured as creating a new module per function it
> wants to jit; I had experimented with using an approach where I had an
> "incubator module" where all IR starts, and then on-demand extract it to
> "compilation modules" when I want to send it to MCJIT, but my experience
> was that this wasn't very helpful.  (My goal was to enable cross-function
> optimizations such as inlining, but there's no easy way [and might not even
> make sense] to run module-level optimizations on a single function.)
>
> The benchmark I set up is a simple REPL loop, where the input is a
> pre-parsed no-op statement.  I put this in a loop and measured the amount
> of time it took, and tested it at 1k iterations and 10k iterations.  This
> includes my IR-generation, but my expectation is that that should be
> negligible compared to the MCJIT time (confirmed through profiling).  The
> absolute numbers are from a Release build with asserts turned off (this
> made a big difference), and the percentages are from a Release+Profiling
> build.
>
> For 1k iterations, the test took about 640ms on my desktop machine, ie
> 0.64ms per module.  Looking at the profiling results, it looks like about
> 47% of the time is spent in PassManagerImpl::run, and another 47% is spent
> in addPassesToEmitMC, which feels like it could be avoided by doing that
> just once.  Of the time spent in PassManagerImpl::run, about 35% is spent
> in PassManager overhead such as initializeAnalysisImpl() /
> removeNotPreservedAnalysis() / removeDeadPasses().
>
> For 10k iterations, the test took about 12.6s, or 1.26ms per module, so
> there's definitely some slowdown happening.  Looking at the profiling
> output, it looks like the main difference is the appearance of
> MCJIT::finalizeLoadedModules(), which ultimately calls
> RuntimeDyldImpl::resolveRelocations() and
> SectionMemoryManager::applyMemoryGroupPermissions(), both of which iterate
> over all memory sections leading to quadratic overhead.  I'm not sure how
> easy it would be, but it seems like there could be single-module variants
> of these apis that could cut down on the overhead, since it looks like
> MCJIT knows what modules need to be finalized but doesn't pass this
> information to the dyld / memory manager.
>
>
> My overall takeaway from these numbers is pretty good: they're good enough
> for where my JIT is right now, and it seems like there's some
> relatively-straightforward work that can be done to make them better.  I'm
> curious what other people think.
>
> Kevin
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131119/506fe6df/attachment.html>