[LLVMdev] Semantics of an Inbounds GetElementPtr
Nicholas White
n.j.white at gmail.com
Mon May 4 14:19:54 PDT 2015
> It's not quite the same testcase.
Yes - it's an extension of the first test case that I'd expect to be
optimised out in the same way as my earlier example (i.e., store a
value, read it back and branch on it). If you miss out the
"check.first.array.element" block (changing the branch that jumps to
it to go to the "abort" label instead) like this:
define void @func2(i8* %mem) {
%1 = icmp eq i8* %mem, null
br i1 %1, label %check.zero, label %stash.zero
stash.zero:
%2 = bitcast i8* %mem to %struct.my_s*
%3 = getelementptr inbounds i8, i8* %mem, i64 4
%4 = bitcast i8* %3 to i32*
store i32 0, i32* %4, align 4
br label %check.zero
check.zero:
%.0.i = phi %struct.my_s* [ %2, %stash.zero ], [ null, %0 ]
%5 = getelementptr inbounds %struct.my_s, %struct.my_s* %.0.i, i64 0, i32 1
%6 = load i32, i32* %5, align 4
%7 = icmp eq i32 %6, 0
br i1 %7, label %success, label %abort
abort:
tail call void @__assert_rtn()
unreachable
success:
ret void
}
Then opt -O3 does optimize it down to:
; Function Attrs: nounwind
define void @func2(i8* nocapture %mem) #0 {
stash.zero:
%0 = getelementptr inbounds i8, i8* %mem, i64 4
%1 = bitcast i8* %0 to i32*
store i32 0, i32* %1, align 4
ret void
}
...so something about the "check.first.array.element" block confuses
whatever analysis opt used to determine %6 was zero in func2.
> Can you walk me through the below testcase and epxlain what you expect
to ahppen?
Definitely:
> %struct.my_s = type { i32, i32, [0 x i8*] }
We only read and write to the first i32, although a code branch never
taken will read first element of the the variable length array.
> ; Function Attrs: noreturn
> declare void @__assert_rtn()
basically any noreturn function
> define void @func(i8* %mem) {
> %1 = icmp eq i8* %mem, null
> br i1 %1, label %check.zero, label %stash.zero
Checks the input pointer to see if it's null - the C code this is
originally derived from didn't check this return value from malloc.
> stash.zero:
> %2 = bitcast i8* %mem to %struct.my_s*
> %3 = getelementptr inbounds i8, i8* %mem, i64 4
get a pointer to the 4th byte of memory, i.e. the second i32 member of
the struct
> %4 = bitcast i8* %3 to i32*
> store i32 0, i32* %4, align 4
and put a zero in it - nb. this branch is only taken when %mem is not null
> br label %check.zero
>check.zero:
> %.0.i = phi %struct.my_s* [ %2, %stash.zero ], [ null, %0 ]
> %5 = getelementptr inbounds %struct.my_s, %struct.my_s* %.0.i, i64 0, i32 1
get a pointer to the second element of the struct a different way, but
because the control flow from both exits of block %0 end up here the
base pointer is actually always %mem, but we may know whether it's
null or not
> %6 = load i32, i32* %5, align 4
the C code loads the value from the second i32 of the struct,
regardless of whether the pointer's null or not. Opt correctly assumes
%5 therefore can't be null.
> %7 = icmp eq i32 %6, 0
compare the value of the second element to zero; we stored zero here
(%4) in the previous block (as we can't have taken the null path from
%0 and still got this far). Opt sometimes seems to deduce this (ie %7
== i1 1) as in the func2 example above
> br i1 %7, label %success, label %check.first.array.element
...so we'll always go to %success. However, replacing this branch with
an unconditional jump - example func2 above - does triggger the
optimisation.
>check.first.array.element:
> %8 = getelementptr inbounds %struct.my_s, %struct.my_s* %.0.i, i64
0, i32 2, i64 0
> %9 = load i8*, i8** %8, align 1
> %10 = icmp eq i8* %9, null
> br i1 %10, label %success, label %abort
I don't think it should matter what this block does
>abort:
> tail call void @__assert_rtn()
> unreachable
>success:
> ret void
>}
I hope you can do something with this. Thanks -
Nick
More information about the llvm-dev
mailing list