[LLVMdev] Using LLVM to serialize object state -- and performance

Fri Oct 26 16:16:49 PDT 2012

I have a legacy C++ application that constructs a tree of C++ objects (an iterator tree to implement a query language).  I am trying to use LLVM to "serialize" the state of this tree to disk for later loading and execution (or "compile" it to disk, if you prefer).

Each of the C++ iterator objects now has a codegen() member function that adds to the LLVM code of an llvm::Function.  The LLVM code generated is a sequence of instructions to set up the arguments for and call the constructor of each C++ object.  (I am using C "thunks" that provide a C API to LLVM to make C++ class constructor calls.)  Hence, all the LLVM code taken together into a single "reconstitute" function are mostly a sequence of "call" instructions with a few "store" and "getelementptr" instructions here and there -- fairly straight-forward LLVM code.

I then write out the LLVM IR code to disk and, at some later time, read it back in with ParseIR(), do getPointerToFunction(), execute that function, and the C++ iterator tree has been reconstituted.

This all works, but the JIT compile step is *slow*.  For a sequence of about 8000 LLVM instructions (most of which are "call"), it takes several seconds to execute.

It occurred to me that I don't really want JIT compiling.  I really want to compile the LLVM code to machine code and write that to disk so that when I read it back, I can just run it. The "reconstitute" function is only ever run once per query invocation, so there's no benefit from JIT compiling it since it will never be run a second or subsequent time.

Questions:

* Is what I'm doing with LLVM a "reasonable" thing to do with LLVM?
* If so, how can I speed it up?  By generating machine code?  If so, how?

I've looked at the source for llc, but that apparently only generates assembly source code, not object code.

- Paul