[LLVMdev] Memory clean for applications using LLVM for JIT compilation

Mon Jan 14 01:56:13 PST 2013

Hello all,

I've already bothered people on IRC with this question and it was recommended to ask it here. 

First of all, some context. In Rubinius (http://rubini.us/, http://github.com/rubinius/rubinius) we use LLVM for our JIT. We create LLVM IR using the C++ API and turn that into machine code using ExecutionEngine::runJITOnFunction. The resulting native code is then installed as the executor for a method.

Right now we use a single LLVMContext (the global context), since we reuse a lot of structures such as the types mapping onto the types in the virtual machine. This does have a downside though, mainly that when constant expressions are used they are stored in the LLVMContext (for example when we use a ConstantInt::get). We have control over some of these allocations, but also some of them are allocated during built in LLVM optimization passes.

This means that over time the LLVMContext keeps growing as we JIT more and more code. This is especially a problem with applications we run that don't have a stable type profile at run time and keep triggering methods to be jitted. In a very dynamic language such as Ruby this is not very uncommon and there are some libraries out there that behave badly in this regard.

So what we're seeing now when running these applications with the JIT enabled, is that memory consumption grows slowly over time. We know it isn't technically a memory leak, since all that data is still reachable and will be cleaned up on shutdown, but what we'd like is a way to control this memory during the time the app runs.

We've explored a few ideas and each of them has significant downsides. The initial idea would be to setup an LLVMContext per compilation request for a method. The problem with this approach is that we'd have to keep this context alive for the lifetime of the jitted method. This would significantly increase memory usage since each context would carry around all kinds of additional data such as the type information for our internal VM types.

A second idea was to copy out the generated native code, but that causes all kinds of problems because of CALL semantics etc. that can call into other jitted code. The last idea we had is more of hack, since I think this uses LLVM in an invalid way (that perhaps might work, perhaps not). This approach was to compile each request in a new context, but only keep the llvm::Function we have as the result alive outside this context. The only reason we need to have this llvm::Function, is to be able to cleanup the native code (with ExecutionEngine::freeMachineCodeForFunction). 

So the question is what would be a recommended way to handle this problem? Is there a way to clean up / free native code like ExecutionEngine::freeMachineCodeForFunction without needing the llvm::Function? Is it safe to use the llvm::Function outside the LLVMContext in the way described here? Is there a way to clean up the constants allocated in the LLVMContext manually?

Or maybe would it be possible to have a custom allocator for memory space for the native code that we could provide? With this last option we would be responsible for the clean up ourselves and just provide memory space to LLVM where it can store the results. 

We're open to different approaches, but we would like to know the recommendations from the LLVM community here.

-- 
Regards,

Dirkjan Bussink