[LLVMdev] Box removal

Tue Jun 28 11:25:15 PDT 2011

In the creation of dynamic languages we often have to box values together.

For instance, take the following expression:

IntObj c = sqrt((a*a)+(b*b));

Here, most likely, a bytecode interpreter would execute this as
"mul_ints", "add_ints", "sqrt", etc. Inside these primitive functions
we would have to unwrap our IntObj types, add the values, allocate a
new object and return that to the function. In the above example, we
could probably expect around 4 allocations, and 7 unboxing operations.
Now granted if my lanugage is running as a bytecode interpreter, I can
speed it up simply by having LLVM call my functions in order, and
perhaps even in-lining all the bytecode operations into a single
function. But even then, I'm still left with the 4 allocations and 7
unboxings (is that even a word?).

I know other compiler projects, such as PyPy have allocation removal
where the optimization passes see that we only use the result of an
allocation a single time. Thinking that LLVM may do this as well, I
tried this simple test on in-browser LLVM compiler:
--------
#include <stdio.h>
#include <stdlib.h>

typedef struct Foo
{
int *x;
int x2;
}Foo;

int main(int argc, char **argv) {
Foo *f = (Foo *)malloc(sizeof(Foo));
f->x = (int *)malloc(sizeof(int));
*f->x = 10;
return *f->x;
}

----

Output:

; ModuleID = '/tmp/webcompile/_28006_0.bc'
target datalayout =
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-linux-gnu"

define i32 @main(i32 %argc, i8** nocapture %argv) nounwind {
entry:
  %0 = tail call noalias i8* @malloc(i64 16) nounwind ; <i8*> [#uses=1]
  %1 = tail call noalias i8* @malloc(i64 4) nounwind ; <i8*> [#uses=1]
  %2 = bitcast i8* %1 to i32*                     ; <i32*> [#uses=2]
  %3 = bitcast i8* %0 to i32**                    ; <i32**> [#uses=1]
  store i32* %2, i32** %3, align 8
  store i32 10, i32* %2, align 4
  ret i32 10
}

------

As you can see, the allocations are still being performed. Now if we
could get rid of these allocations, then this entire function could be
reduced to a single line that just returns 10.

Is any optimization like this planned or even feasible in LLVM?

Thank you for your time.

Timothy Baldridge

-- 
“One of the main causes of the fall of the Roman Empire was
that–lacking zero–they had no way to indicate successful termination
of their C programs.”
(Robert Firth)