[LLVMdev] Promoting malloc to alloca

Sun Jul 11 22:22:26 PDT 2010

We don't have any such optimization.

Firstly, the pointer has to not make it into any function call at all, 
since any function might in turn call free(). Then we need to do escape 
analysis as you pointed out, but that's not difficult. We do similar 
analysis to determine pointer capture already.

Matt Giuca wrote:
> I have a compiler which generates a lot of malloc instructions for
> basically all data. I was rather hoping that these mallocs would be
> converted to allocas if the pointers didn't escape. This would require
> an escape analysis, and I'm not sure if LLVM has such a thing.
>
> For instance, consider this code (typical of the output of my compiler):
>
> define i32 @dontescape(i32 %x) {
> entry:
>    %t = tail call i8* @malloc(i32 4)
>    %t1 = bitcast i8* %t to i32*
>    store i32 %x, i32* %t1
>    %t2 = load i32* %t1
>    ret i32 %t2
> }

When do you free it? Why not just use an alloca?

> Running with opt --std-compile-opts, LLVM is smart enough to figure
> out that the result is actually %x. So it generates:
>
> define i32 @dontescape(i32 %x) nounwind {
> entry:
>    %t = tail call i8* @malloc(i32 4)               ;<i8*>  [#uses=1]
>    %t1 = bitcast i8* %t to i32*                    ;<i32*>  [#uses=1]
>    store i32 %x, i32* %t1
>    ret i32 %x
> }
>
> This saves a load, but I was rather hoping to eliminate the malloc and
> store altogether. I was hoping it would realise that %t doesn't
> escape, and generate this code:
>
> define i32 @dontescape(i32 %x) nounwind {
> entry:
>    %t1 = alloca i32, i32 1                           ;<i32*>  [#uses=1]
>    store i32 %x, i32* %t1
>    ret i32 %x
> }

LLVM doesn't realize that @malloc has no other side-effects. It's 
treating it like a function that might perform I/O.

> which would then get optimised to:
>
> define i32 @dontescape(i32 %x) nounwind readnone {
> entry:
>    ret i32 %x
> }
>
> and in general the allocas would go on to become registers due to
> mem2reg, which would be wonderfully efficient.
>
> So firstly, is there anything I'm doing wrong which is preventing this
> from working? For example, I tried declaring malloc as "readnone" (is
> this safe?).

No. If it's readnone then two calls with the same argument can be folded 
to one.

  That would partly fix the problem (if you allocate memory
> but don't return it, it can optimise away the malloc), except that it
> isn't clever enough to realise the "store" instruction is redundant.
> Again, I think this would require an escape analysis to fix. Maybe
> there's an optimisation pass I'm missing here.
>
> Secondly, if there is currently no way to magically make this work,
> has this proposal been discussed before? Are there any known gotchas
> which prevented it from being implemented?

Doing this in general requires either that the free can't be found or 
that it really is right before the function exit, or else there might be 
a good reason the memory is on the heap. Then we have to make sure that 
we don't transform malloc calls inside loops (unless we also prove 
they're independent and Just Leaked).

How large a block of memory are you willing to move from heap to stack? 
What if it's a variable, how hard should we work to try to determine its 
upper bound (we probably can't with any reasonable amount of effort). 
And what type is the alloca? We could take the 4 from malloc(i32 4) and 
create an alloca i8, 4 and let instcombine try to sort it out.

Where should we look for malloc calls? Note that we only do stack to 
register transformation on alloca's in the first basic block if they're 
stacked together in a row as the very first instructions in the function.

> I've found this file. I'm not sure if it's part of the LLVM
> distribution, or a branch:
> https://llvm.org/svn/llvm-project/llvm/tags/Apple/llvmCore-2092/lib/Analysis/EscapeAnalysis.cpp
> It contains a comment: "Returns allow the return value to escape.
> This is mostly important for malloc to alloca promotion."

That does exist in head, but strangely enough I can't find any 
implementation of it? It looks like it was removed when we straightened 
out value tracking + capture tracking.