[LLVMdev] Modifying machine code after it's generated with LLVM

Fri Apr 12 01:07:25 PDT 2013

Hi all,

I'd like to ask about something I've been thinking about and haven't found a way to make this possible with LLVM, so I'm wondering whether I'm not looking properly or that indeed that can't be done currently.

What I'm trying to do is to be able to modify generated assembly code after it has been created through LLVM. At this point the original LLVM IR isn't available anymore. Basically I want to be able to use this to do things like back patching etc. the code after for example type information has changed. 

To make the example concrete, this is for the Rubinius JIT, so it creates machine code for Ruby code. It needs to insert type guards, which consist of a class_id identifying the class and a serial number which is increment when for example a class is reopened and methods are added or redefined. 

In a lot of cases this serial number is updated, but for the JIT'ted code, the end result is the same. It still dispatches to the same method, no code in the call path is changed etc. So what I'd like to be able to do is in case the serial number doesn't match, fallback to a back patching function that first checks whether the receiver has changed. If the receiver has changed, it can do an immediate deoptimization. If the receiver didn't change, I'd like to be able to back patch the native code with the new serial number, so the next time it will match the guard and we don't have to check the serial again.

For this I'd need to be able to pass a memory address where the actual native code is generated to this fallback function, but I haven't found any ways of doing this. I've tried using globals, but that would create a huge number of globals and an additional dereference for every time we run the type guard (which happens quite often).

The same reasoning can be applied to being able to back patch function calls. If we JIT a method that calls a certain other method that isn't called often, we don't inline it. But after some time, that we didn't inline might be jitted, so we'd like to replace it at that point with a direct call to the jitted code instead of having to jump around first into a generic mechanism which then sees the code is jitted. For this, being able to do the same thing as in the serial scenario would be great, so we can update the call site. Right now on x86_64 this is tricky, because it retrieves the function to call from a register and isn't encoded directly in the call instruction (like on x86). 

So are there any ways to tackle this issue using certain mechanisms in LLVM? Hopefully someone is able to help out with this problem. 

-- 
Dirkjan