[LLVMdev] Performance degradation when repeatedly exchanging JITted functions

Tue Mar 6 07:29:28 PST 2012

Hi all,

for a research project we need to repeatedly exchange functions in a 
program running in the JIT compiler.
We currently do this by calling recompileAndRelinkFunction(), after 
changing the body of the function. Of course we synchronize enough to 
ensure that the JIT doesn't concurrently compile the function (which 
should only happen if lazy compilation is enabled).

Now recompileAndRelinkFunction saves the old function pointer, then runs 
the JIT, and writes a jump to the new function pointer at the memory of 
the old function.
The problem with this implementation is (and I verified that this really 
happens) that this builds chains of jumps, that are traversed each time 
the function is called. This is because the callsites are never updated. 
There is actually a FIXME in the JITEmitter saying "FIXME: We could 
rewrite all references to this stub if we knew them.", but of course it 
would be hard to catch them all, given the variety of call instructions.
Another drawback is that the memory of old function memory can never be 
freed, since it is still used in the jump chain.

To measure the performance impact of this, I wrote a small example 
program, where each second the function is recompiled and the number of 
method calls is printed (Mcalls = million calls). The performance 
degradation is quite impressive:
After     0 replacements:  335.724 Mcalls/sec
After     1 replacements:  274.735 Mcalls/sec ( 82.010% of initial)
After     2 replacements:  232.640 Mcalls/sec ( 69.445% of initial)
After     3 replacements:  201.898 Mcalls/sec ( 60.268% of initial)
After     4 replacements:  177.727 Mcalls/sec ( 53.053% of initial)
After     5 replacements:  158.765 Mcalls/sec ( 47.393% of initial)
After    10 replacements:  102.098 Mcalls/sec ( 30.477% of initial)
After    20 replacements:   60.197 Mcalls/sec ( 17.969% of initial)
After    50 replacements:   27.049 Mcalls/sec (  8.074% of initial)
After   200 replacements:    7.438 Mcalls/sec (  2.220% of initial)
After   460 replacements:    3.273 Mcalls/sec (  0.977% of initial)

I think a solution would be to always call a function through it's stub, 
so that there is a single location to update when the function is 
exchanged. This would mean that there is always exactly one level of 
indirection, which is worse for programs that don't exchange functions 
at runtime, but is much better in our scenario.
I tried to add a flag to the JIT to implement that (always return the 
address of the stub and never update the global mapping), but I gave up 
since there are too many classes relying on the update of the global map 
(including the JIT itself).

An alternative approach that won't require patching llvm would be to 
manage an array of all function pointers in the "VM" we are 
implementing, and then to replace in the bitcode each direct function 
call by a load from that array, and a call to that address. Then the VM 
could just update the array after recompiling a function, and all call 
sites will use the new pointer.
The overhead should be comparable to the "always go through stub" method.
Some more logic would be required to handle indirect calls, but this 
could be handled by callbacks into the VM.

But before implementing that I wanted to ask if anybody already has a 
working solution for the problem.
Or whether the problem is important enough to address it directly in LLVM.

Cheers,
Clemens

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: RepeatedMethodExchange.cpp
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120306/5bae9bf9/attachment.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6392 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120306/5bae9bf9/attachment.bin>