[LLVMdev] Introductions to everyone and a call for Python-LLVM enthusiasts

Fri Jul 13 01:32:31 PDT 2012

Joerg Blank wrote:
> Hello Duncan,
>
>> thanks for your interesting email.  Do you understand why PyPy is no longer
>> using LLVM, and why Unladen Swallow died?  Does LLVM need to be improved in
>> some way?
>
> The answers to all these questions are linked: LLVM is not fast enough
> (for a JIT). Of course this is not the whole story, but it is the
> LLVM-relevant part.
>
> Let's have a look at some random performance numbers from one of my pet
> projects:
>
> Generate-time: 0.000377893
> Compile-time: 0.00987911
> 1) 0.012272357940673828
> 2) 0.0018310546875
> 3) 0.0037310123443603516
>
> Generate-Time is the time it takes for my code to generate the llvm-ir.
> Compile-Time is llvm-opts + codegen (mcjit). And this is for a really
> small function.
>
> 1) Is the total time for jitting + running
> 2) time for running the compiled code
> 3) time in the native interpreter
>
> While 2) is entirely in the domain of the person using LLVM, the other
> times gives us some serious points for consideration: One needs to be
> really really fast to offset the cost of compiling something. And we
> have to be really sure what we compile (dynamic feedback) because
> recompilation is expensive too.
>
> This is less of a problem for long running processes, but think about a
> javascript jit.
> LLVM is also really memory hungry: Lua + LuaJIT uses just 200kb, LLVM
> alone is much larger. Again less of a problem for server processes, as
> dynamic libs are shared.
>
> Full Disclosure: I was a independent contributor to Unladen Swallow
>
> Unladen Swallow failed on 2). It was clearly targeted on long running
> processes, but it failed to provide a reasonable performance boost in
> compiled code. It had to fight some llvm bugs (this was quite some time
> ago already) and by the time the jit worked, too much time had already
> passed (for Google). It also head a top-down development model: Start
> with a general compiled function (just removing the dispatch overhead)
> and add optimization later. Trying to get low hanging fruits first may
> have be a better idea.
>
> PyPy changed because of the above + they missed some features. One has
> to admit that the garbage collection interface was (is?) in some pretty
> bad shape. PyPy relies heavly on its gc.
> They also need to be able to patch jumps to add new compiled traces to
> failed guards.
> They could write their new asm in (r)python ... so they did.
>
> What LLVM needs (imho): LLVM was build as a compiler, not a jit. That's
> a big difference on the assumptions about runtime and code quality.
> In a jit you do not care if the register allocation is not optimal, when
> you get your compiled code fast. If you want to compete with GCC, every
> stack spill counts!
>
> tl;dr much faster code gen (even if it costs code quality), lower memory
> use, more information about the generated machinecode.

I have a little bit to add to this story. One of the things to remember 
is that LLVM only takes care of low-level issues, it's important to 
perform high-level optimizations before producing LLVM IR. The IR 
produced by unladen swallow was enormous. If I recall correctly, the simple

   def add(x, y):
     return x + y

was close to 100 basic blocks due to all the implicit method calls and 
fallback paths to the interpreter. This size expansion was the driving 
cause of the slow llvm compile times. More high level optimizations 
should help reduce this.

It was hard to find good high-level optimizations to add to unladen 
swallow because of a design decision to make the llvm path optional and 
permit changes to the non-llvm path to happen by developers with no 
knowledge of the llvm side. Consider type inference for instance. U-S 
couldn't do type inference in the llvm lowering path because that would 
require hard-coding some knowledge of the type system into the 
llvm-specific code path.

Nick