[LLVMdev] -O0 compile time speed

Sun Nov 22 13:58:26 PST 2009

Eric Christopher <echristo at apple.com> writes:

>>> Sort of. Why you think more speed than LLVM currently provides is a 
>>> significant benefit.
>> 
>> My compiler supports LLVM as a backend. The language heavily relies on
>> compile-time environment-dependent code generation, so it needs the
>> JIT. One of the things that is holding back LLVM on production systems
>> is that it needs minutes to JIT a medium-sized application. That's
>> barely tolerable for long-running server applications, and a big no-no
>> for client apps.
>
> Not that I'm against faster jitting but have you tried an
> interpreter/jit solution, where you're jitting only hot functions in
> your application? Or using an O0 jit and then recompiling hot
> functions?

My compiler produces pseudo-instructions for a virtual stack-based
machine. This code can be executed or translated to some other backend
(LLVM, for instance). If the LLVM backend is used, all the code needs to
be jitted, as the code produced by LLVM can't mix with the
pseudo-instructions.

The emulated stack-machine is quite fast (several times faster than
Python, for instance) and I guess that is faster than the LLVM
interpreter.

It is very difficult to determine which functions are "hot". There are
several possible heuristics, but failing to determine just one or two
functions as hot may severely impact the final perfomance, due to the
slowness of the interpreter.

Finally, my measurements say that optimization is not where most of the
time is used (IIRC, going from -O0 to -O2 adds 30% of compile
time). Generating the LLVM IR takes almost no time. Is the process of
turning the LLVM IR into executable code what is "slow" and, worst
still, the time required grows faster than the LLVM IR code to be
JITted.

> Not that I know exactly what you're doing, but thought I'd try to help
> a bit.

I apprecite your help.

Just as as some anecdotical data, I'll mention that one of the backends
that my compiler uses simply maps each pseudo-instruction to assembler
code doing *very* simple optimizations and register allocation. The
resulting code still emulates the stack machine, so it is possible to
mix assembled code with pseudo-instructions, which makes the resulting
assembler suck even more. The assembler code is saved to a file, an
assembler is invoked and the resulting raw binary file is loaded into
memory, ready to be executed. The process is very fast compared to LLVM
-O0 (about 3x faster for my current test application) but the most
surprising part is that the native code runs significantly faster than
LLVM JITted code at -O0 level. At LLVM -O2 level, you can come with
benchmarks that runs several times faster than the assembled code, but
on real-world applications within my application domain (database
managers) the difference turns the be only 20% on average in favor of
LLVM -O2 for cpu-intensive tasks.

I guess that an approach that simply translates LLVM IR to assembler
doing the simplest register allocation would be several times faster
than the current JIT, and even produce faster code.

-- 
Óscar