[LLVMdev] Semantics of an Inbounds GetElementPtr

Mon May 4 14:19:54 PDT 2015

> It's not quite the same testcase.
Yes - it's an extension of the first test case that I'd expect to be
optimised out in the same way as my earlier example (i.e., store a
value, read it back and branch on it). If you miss out the
"check.first.array.element" block (changing the branch that jumps to
it to go to the "abort" label instead) like this:

define void @func2(i8* %mem) {
  %1 = icmp eq i8* %mem, null
  br i1 %1, label %check.zero, label %stash.zero

stash.zero:
  %2 = bitcast i8* %mem to %struct.my_s*
  %3 = getelementptr inbounds i8, i8* %mem, i64 4
  %4 = bitcast i8* %3 to i32*
  store i32 0, i32* %4, align 4
  br label %check.zero

check.zero:
  %.0.i = phi %struct.my_s* [ %2, %stash.zero ], [ null, %0 ]
  %5 = getelementptr inbounds %struct.my_s, %struct.my_s* %.0.i, i64 0, i32 1
  %6 = load i32, i32* %5, align 4
  %7 = icmp eq i32 %6, 0
  br i1 %7, label %success, label %abort

abort:
  tail call void @__assert_rtn()
  unreachable

success:
  ret void
}

Then opt -O3 does optimize it down to:

; Function Attrs: nounwind
define void @func2(i8* nocapture %mem) #0 {
stash.zero:
  %0 = getelementptr inbounds i8, i8* %mem, i64 4
  %1 = bitcast i8* %0 to i32*
  store i32 0, i32* %1, align 4
  ret void
}

...so something about the "check.first.array.element" block confuses
whatever analysis opt used to determine %6 was zero in func2.

> Can you walk me through the below testcase and epxlain what you expect
to ahppen?
Definitely:

> %struct.my_s = type { i32, i32, [0 x i8*] }
We only read and write to the first i32, although a code branch never
taken will read first element of the the variable length array.

> ; Function Attrs: noreturn
> declare void @__assert_rtn()
basically any noreturn function

> define void @func(i8* %mem) {
>  %1 = icmp eq i8* %mem, null
>  br i1 %1, label %check.zero, label %stash.zero
Checks the input pointer to see if it's null - the C code this is
originally derived from didn't check this return value from malloc.

> stash.zero:
>  %2 = bitcast i8* %mem to %struct.my_s*
>  %3 = getelementptr inbounds i8, i8* %mem, i64 4
get a pointer to the 4th byte of memory, i.e. the second i32 member of
the struct
>  %4 = bitcast i8* %3 to i32*
>  store i32 0, i32* %4, align 4
and put a zero in it - nb. this branch is only taken when %mem is not null
>  br label %check.zero

>check.zero:
>  %.0.i = phi %struct.my_s* [ %2, %stash.zero ], [ null, %0 ]
>  %5 = getelementptr inbounds %struct.my_s, %struct.my_s* %.0.i, i64 0, i32 1
get a pointer to the second element of the struct a different way, but
because the control flow from both exits of block %0 end up here the
base pointer is actually always %mem, but we may know whether it's
null or not
>  %6 = load i32, i32* %5, align 4
the C code loads the value from the second i32 of the struct,
regardless of whether the pointer's null or not. Opt correctly assumes
%5 therefore can't be null.
>  %7 = icmp eq i32 %6, 0
compare the value of the second element to zero; we stored zero here
(%4) in the previous block (as we can't have taken the null path from
%0 and still got this far). Opt sometimes seems to deduce this (ie %7
== i1 1) as in the func2 example above
>  br i1 %7, label %success, label %check.first.array.element
...so we'll always go to %success. However, replacing this branch with
an unconditional jump - example func2 above - does triggger the
optimisation.

>check.first.array.element:
>  %8 = getelementptr inbounds %struct.my_s, %struct.my_s* %.0.i, i64
0, i32 2, i64 0
>  %9 = load i8*, i8** %8, align 1
>  %10 = icmp eq i8* %9, null
>  br i1 %10, label %success, label %abort
I don't think it should matter what this block does

>abort:
>  tail call void @__assert_rtn()
>  unreachable
>success:
>  ret void
>}

I hope you can do something with this. Thanks -

Nick