[LLVMdev] Performance degradation when repeatedly exchanging JITted functions
Clemens Hammacher
hammacher at cs.uni-saarland.de
Tue Mar 6 07:29:28 PST 2012
Hi all,
for a research project we need to repeatedly exchange functions in a
program running in the JIT compiler.
We currently do this by calling recompileAndRelinkFunction(), after
changing the body of the function. Of course we synchronize enough to
ensure that the JIT doesn't concurrently compile the function (which
should only happen if lazy compilation is enabled).
Now recompileAndRelinkFunction saves the old function pointer, then runs
the JIT, and writes a jump to the new function pointer at the memory of
the old function.
The problem with this implementation is (and I verified that this really
happens) that this builds chains of jumps, that are traversed each time
the function is called. This is because the callsites are never updated.
There is actually a FIXME in the JITEmitter saying "FIXME: We could
rewrite all references to this stub if we knew them.", but of course it
would be hard to catch them all, given the variety of call instructions.
Another drawback is that the memory of old function memory can never be
freed, since it is still used in the jump chain.
To measure the performance impact of this, I wrote a small example
program, where each second the function is recompiled and the number of
method calls is printed (Mcalls = million calls). The performance
degradation is quite impressive:
After 0 replacements: 335.724 Mcalls/sec
After 1 replacements: 274.735 Mcalls/sec ( 82.010% of initial)
After 2 replacements: 232.640 Mcalls/sec ( 69.445% of initial)
After 3 replacements: 201.898 Mcalls/sec ( 60.268% of initial)
After 4 replacements: 177.727 Mcalls/sec ( 53.053% of initial)
After 5 replacements: 158.765 Mcalls/sec ( 47.393% of initial)
After 10 replacements: 102.098 Mcalls/sec ( 30.477% of initial)
After 20 replacements: 60.197 Mcalls/sec ( 17.969% of initial)
After 50 replacements: 27.049 Mcalls/sec ( 8.074% of initial)
After 200 replacements: 7.438 Mcalls/sec ( 2.220% of initial)
After 460 replacements: 3.273 Mcalls/sec ( 0.977% of initial)
I think a solution would be to always call a function through it's stub,
so that there is a single location to update when the function is
exchanged. This would mean that there is always exactly one level of
indirection, which is worse for programs that don't exchange functions
at runtime, but is much better in our scenario.
I tried to add a flag to the JIT to implement that (always return the
address of the stub and never update the global mapping), but I gave up
since there are too many classes relying on the update of the global map
(including the JIT itself).
An alternative approach that won't require patching llvm would be to
manage an array of all function pointers in the "VM" we are
implementing, and then to replace in the bitcode each direct function
call by a load from that array, and a call to that address. Then the VM
could just update the array after recompiling a function, and all call
sites will use the new pointer.
The overhead should be comparable to the "always go through stub" method.
Some more logic would be required to handle indirect calls, but this
could be handled by callbacks into the VM.
But before implementing that I wanted to ask if anybody already has a
working solution for the problem.
Or whether the problem is important enough to address it directly in LLVM.
Cheers,
Clemens
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: RepeatedMethodExchange.cpp
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120306/5bae9bf9/attachment.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6392 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120306/5bae9bf9/attachment.bin>
More information about the llvm-dev
mailing list