[LLVMdev] Alias Analysis: zero terminated strings

Mon Sep 12 09:31:24 PDT 2011

Carl-Philip Hänsch wrote:
> Hello,
>
> I'm developing a programming language that is optimized for strings. A
> first hello world program shows me that llvm needs a lot more work on
> zero terminated strings.
> In the following example, I have an auto generated hello world example
> optimized with -O3. The problem is, that the constant string is copied
> into a malloced mem area, then puts is called and then the memory is
> freed. There is also some leftover from the reference counters. These
> are found by the dead code eliminaton after the puts call. But before
> the puts call, the constant folded number is put into the memory and is
> never used. I was told that llvm assumes that a function also can read
> below the pointer, so dead code elimination does not work here. The
> second thing i would like to have there is to tell LLVM that the
> interesting memory ends after the zero termination. I think these two
> flags: dont_read_below and dont_read_above_zero should be enough to make
> LLVM optimze that example.

LLVM could figure out that there is no "below the pointer" by noticing 
that the object came from malloc. I think the missing optimization here 
is a heap->stack transform.

I note that in your example the exit is not post-dominated by free(). 
The transform could still fire by noticing that the pointer returned by 
malloc never escaped the function. (A more expensive check would be to 
see that it never escaped the function along the path that didn't call 
free. This is related to http://llvm.org/PR8908#c1 .)

One other thing we may want is a flag for "does not care about the 
pointer itself, only what the pointer points to". Currently, nothing 
tells LLVM that puts() doesn't check whether the pointer argument == 
&string_00000001, so we can't actually remove the copy. We could 
special-case that optimization into SimplifyLibCalls.

Nick