[llvm-dev] Memory Store/Load Optimization Issue (Emulating stack)
Paul Peet via llvm-dev
llvm-dev at lists.llvm.org
Wed Feb 10 12:18:39 PST 2016
Thank you for the hint.
I adjusted the code and it works:
The code after replacing inttoptr with getelementptr:
define { i32, i32, i8* } @test(i32 %foo, i32 %bar, i8* %sp) {
entry:
; push foo (On "stack")
%sp_1 = getelementptr i8, i8* %sp, i32 -4
%sp_1_ptr = bitcast i8* %sp_1 to i32*
store i32 %foo, i32* %sp_1_ptr, align 4
; push bar
%sp_2 = getelementptr i8, i8* %sp_1, i32 -4
%sp_2_ptr = bitcast i8* %sp_2 to i32*
store i32 %bar, i32* %sp_2_ptr, align 4
; val1 = pop (val1 = bar)
%sp_3_ptr = bitcast i8* %sp_2 to i32*
%val1 = load i32, i32* %sp_3_ptr, align 4
%sp_3 = getelementptr i8, i8* %sp_2, i32 4
; val2 = pop (val2 = foo)
%sp_4_ptr = bitcast i8* %sp_3 to i32*
%val2 = load i32, i32* %sp_4_ptr, align 4
%sp_4 = getelementptr i8, i8* %sp_3, i32 4
%ret_1 = insertvalue { i32, i32, i8* } undef, i32 %val1, 0
%ret_2 = insertvalue { i32, i32, i8* } %ret_1, i32 %val2, 1
%ret_3 = insertvalue { i32, i32, i8* } %ret_2, i8* %sp_4, 2
ret { i32, i32, i8* } %ret_3
}
After optimization ("opt -instcombine ./code.ll -S")
define { i32, i32, i8* } @test(i32 %foo, i32 %bar, i8* %sp) {
entry:
%sp_1 = getelementptr i8, i8* %sp, i64 -4
%sp_1_ptr = bitcast i8* %sp_1 to i32*
store i32 %foo, i32* %sp_1_ptr, align 4
%sp_2 = getelementptr i8, i8* %sp, i64 -8
%sp_2_ptr = bitcast i8* %sp_2 to i32*
store i32 %bar, i32* %sp_2_ptr, align 4
%ret_1 = insertvalue { i32, i32, i8* } undef, i32 %bar, 0
%ret_2 = insertvalue { i32, i32, i8* } %ret_1, i32 %foo, 1
%ret_3 = insertvalue { i32, i32, i8* } %ret_2, i8* %sp, 2
ret { i32, i32, i8* } %ret_3
}
My only questions are now:
- How is it that inttoptr cannot provide that specific alias information so
it can optimize that store/load away ?
- Might it be possible to get inttoptr providing such alias analysis ?
- I came across MemorySSA while browsing though the llvm source. Is it
possible that one can use MemorySSA to do such optimization without alias
analysis ?
- Where do I have to look in the source which is doing this kind of
optimization (Is it instcombine which uses lib/Analysis/Loads.cpp ?)
Regards,
Paul
2016-02-10 0:26 GMT+01:00 Philip Reames <listmail at philipreames.com>:
> Two points:
> - Using inttoptr is a mistake here. GEPs are strongly preferred and
> provide strictly more aliasing information to the optimizer.
> - The zext is a bit weird. I'm not sure where that came from, but I'd not
> bother looking into until the preceding point is addressed.
>
> In general, you may find these docs useful:
> http://llvm.org/docs/Frontend/PerformanceTips.html
>
> Philip
>
>
>
> On 02/08/2016 06:54 AM, Paul Peet via llvm-dev wrote:
>
> Hello,
>
> I am trying to emulate the "stack" as like on x86 when using push/pop so
> afterwards I can use LLVM's optimizer passes to simplify (reduce junk) the
> code.
>
> The LLVM IR code:
>
> define { i32, i32, i32 } @test(i32 %foo, i32 %bar, i32 %sp) {
> ; push foo (On "stack")
> %sp_1 = sub i32 %sp, 4
> %sp_1_ptr = inttoptr i32 %sp_1 to i32*
> store i32 %foo, i32* %sp_1_ptr, align 4
>
> ; push bar
> %sp_2 = sub i32 %sp_1, 4
> %sp_2_ptr = inttoptr i32 %sp_2 to i32*
> store i32 %bar, i32* %sp_2_ptr, align 4
>
> ; val1 = pop (val1 = bar)
> %sp_3_ptr = inttoptr i32 %sp_2 to i32*
> %val1 = load i32, i32* %sp_3_ptr, align 4
> %sp_3 = add i32 %sp_2, 4
>
> ; val2 = pop (val2 = foo)
> %sp_4_ptr = inttoptr i32 %sp_3 to i32*
> %val2 = load i32, i32* %sp_4_ptr, align 4
> %sp_4 = add i32 %sp_3, 4
>
> %ret_1 = insertvalue { i32, i32, i32 } undef, i32 %val1, 0
> %ret_2 = insertvalue { i32, i32, i32 } %ret_1, i32 %val2, 1
> %ret_3 = insertvalue { i32, i32, i32 } %ret_2, i32 %sp_4, 2
>
> ret { i32, i32, i32 } %ret_3
> }
>
> This code will "push" two values onto the stack and pop them in reverse
> order so afterwards "foo" and "bar" will be swapped and returned back.
>
> After running this through "opt -O2 ./test.ll", I am getting this:
>
> define { i32, i32, i32 } @test(i32 %foo, i32 %bar, i32 %sp) #0 {
> %sp_1 = add i32 %sp, -4
> %1 = zext i32 %sp_1 to i64
> %sp_1_ptr = inttoptr i64 %1 to i32*
> store i32 %foo, i32* %sp_1_ptr, align 4
> %sp_2 = add i32 %sp, -8
> %2 = zext i32 %sp_2 to i64
> %sp_2_ptr = inttoptr i64 %2 to i32*
> store i32 %bar, i32* %sp_2_ptr, align 4
> %val2 = load i32, i32* %sp_1_ptr, align 4
> %ret_1 = insertvalue { i32, i32, i32 } undef, i32 %bar, 0 ; Swapped
> %ret_2 = insertvalue { i32, i32, i32 } %ret_1, i32 %val2, 1; Not Swapped
> (Not optimized; Should be %foo)
> %ret_3 = insertvalue { i32, i32, i32 } %ret_2, i32 %sp, 2
> ret { i32, i32, i32 } %ret_3
> }
>
> As you can see that the IR has got additional code, eg. zext. But the main
> problem here is that val2 hasn't been optimized.
> Could anyone show me some hints what is preventing the second val from
> being optimized? (My guess would be the zext because I am using %sp as a
> 32bit pointer although the "target" is 64bit).
>
> Regards,
> Paul
>
>
> _______________________________________________
> LLVM Developers mailing listllvm-dev at lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160210/df898102/attachment.html>
More information about the llvm-dev
mailing list