<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">2016-02-10 22:23 GMT+01:00 Hal Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span>:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span class="">----- Original Message -----<br>
> From: "Paul Peet via llvm-dev" <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>><br>
> To: "Daniel Berlin" <<a href="mailto:dberlin@dberlin.org">dberlin@dberlin.org</a>><br>
> Cc: "llvm-dev" <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>><br>
</span><span class="">> Sent: Wednesday, February 10, 2016 3:13:15 PM<br>
> Subject: Re: [llvm-dev] Memory Store/Load Optimization Issue (Emulating stack)<br>
><br>
><br>
><br>
</span><span class="">> Thanks for the answers. Although I am not sure if I've understood the<br>
> docs about how inttoptr/ptrtointr are different when compared to<br>
> gep.<br>
> It says: "It’s invalid to take a GEP from one object, address into a<br>
> different separately allocated object, and dereference it.".<br>
<br>
</span>This refers to the underlying allocation that created the memory. Where did %sp come from? Is it an alloca instruction, or from some other source?<br>
<span class=""><br></span></blockquote><div><br></div><div>It's allocated via malloc and passed to the function.</div><div><div>define { i32, i32, i8* } @test(i32 %foo, i32 %bar, i8* %sp_x)</div></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span class="">
> To go back to my intention why I am doing this, I would like to<br>
> "emulate" some x86 instructions with llvm-ir but as far as I<br>
> understand that aliasing rule, I am not sure if I am breaking that<br>
> rule.<br>
><br>
><br>
> For example when translating this x86 code to llvm ir:<br>
><br>
><br>
> push eax<br>
> add esp, 2<br>
> push ecx<br>
> ...<br>
><br>
><br>
><br>
> ; push foo (On "stack")<br>
> %sp_1 = getelementptr i8, i8* %sp, i32 -4<br>
> %sp_1_ptr = bitcast i8* %sp_1 to i32*<br>
> store i32 %foo, i32* %sp_1_ptr, align 4<br>
><br>
><br>
> %sp_x = getelementptr i8, i8* %sp_1, i32 2<br>
><br>
><br>
> ; push bar<br>
> %sp_2 = getelementptr i8, i8* %sp_x, i32 -4<br>
> %sp_2_ptr = bitcast i8* %sp_2 to i32*<br>
> store i32 %bar, i32* %sp_2_ptr, align 4<br>
><br>
><br>
> Both objects (eax, ecx) will overlap because of the size difference<br>
> (eax = i32). What are the consequences when doing this. Will this<br>
> break alias analysis for the further instructions?<br>
><br>
<br>
</span>Partially overlapping writes to do not, in themselves, break anything. AA should handle that just fine.<br>
<span class=""><font color="#888888"><br></font></span></blockquote><div><br></div><div>What do you mean by partially?</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span class=""><font color="#888888">
-Hal<br>
</font></span><div class=""><div class="h5"><br>
><br>
> 2016-02-10 21:24 GMT+01:00 Daniel Berlin < <a href="mailto:dberlin@dberlin.org">dberlin@dberlin.org</a> > :<br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
> On Wed, Feb 10, 2016 at 12:18 PM, Paul Peet via llvm-dev <<br>
> <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a> > wrote:<br>
><br>
><br>
><br>
> Thank you for the hint.<br>
><br>
><br>
> I adjusted the code and it works:<br>
><br>
><br>
> The code after replacing inttoptr with getelementptr:<br>
><br>
><br>
><br>
> define { i32, i32, i8* } @test(i32 %foo, i32 %bar, i8* %sp) {<br>
> entry:<br>
> ; push foo (On "stack")<br>
> %sp_1 = getelementptr i8, i8* %sp, i32 -4<br>
> %sp_1_ptr = bitcast i8* %sp_1 to i32*<br>
> store i32 %foo, i32* %sp_1_ptr, align 4<br>
><br>
><br>
> ; push bar<br>
> %sp_2 = getelementptr i8, i8* %sp_1, i32 -4<br>
> %sp_2_ptr = bitcast i8* %sp_2 to i32*<br>
> store i32 %bar, i32* %sp_2_ptr, align 4<br>
><br>
><br>
> ; val1 = pop (val1 = bar)<br>
> %sp_3_ptr = bitcast i8* %sp_2 to i32*<br>
> %val1 = load i32, i32* %sp_3_ptr, align 4<br>
> %sp_3 = getelementptr i8, i8* %sp_2, i32 4<br>
><br>
><br>
> ; val2 = pop (val2 = foo)<br>
> %sp_4_ptr = bitcast i8* %sp_3 to i32*<br>
> %val2 = load i32, i32* %sp_4_ptr, align 4<br>
> %sp_4 = getelementptr i8, i8* %sp_3, i32 4<br>
><br>
><br>
> %ret_1 = insertvalue { i32, i32, i8* } undef, i32 %val1, 0<br>
> %ret_2 = insertvalue { i32, i32, i8* } %ret_1, i32 %val2, 1<br>
> %ret_3 = insertvalue { i32, i32, i8* } %ret_2, i8* %sp_4, 2<br>
><br>
><br>
> ret { i32, i32, i8* } %ret_3<br>
> }<br>
><br>
><br>
> After optimization ("opt -instcombine ./code.ll -S")<br>
><br>
><br>
><br>
> define { i32, i32, i8* } @test(i32 %foo, i32 %bar, i8* %sp) {<br>
> entry:<br>
> %sp_1 = getelementptr i8, i8* %sp, i64 -4<br>
> %sp_1_ptr = bitcast i8* %sp_1 to i32*<br>
> store i32 %foo, i32* %sp_1_ptr, align 4<br>
> %sp_2 = getelementptr i8, i8* %sp, i64 -8<br>
> %sp_2_ptr = bitcast i8* %sp_2 to i32*<br>
> store i32 %bar, i32* %sp_2_ptr, align 4<br>
> %ret_1 = insertvalue { i32, i32, i8* } undef, i32 %bar, 0<br>
> %ret_2 = insertvalue { i32, i32, i8* } %ret_1, i32 %foo, 1<br>
> %ret_3 = insertvalue { i32, i32, i8* } %ret_2, i8* %sp, 2<br>
> ret { i32, i32, i8* } %ret_3<br>
> }<br>
><br>
><br>
> My only questions are now:<br>
> - How is it that inttoptr cannot provide that specific alias<br>
> information so it can optimize that store/load away ?<br>
> Because nothing tracks what happens to the ints, and what happens<br>
> when they are converted back to pointers and whether it's sane :)<br>
> <a href="http://llvm.org/docs/GetElementPtr.html#how-is-gep-different-from-ptrtoint-arithmetic-and-inttoptr" rel="noreferrer" target="_blank">http://llvm.org/docs/GetElementPtr.html#how-is-gep-different-from-ptrtoint-arithmetic-and-inttoptr</a><br>
><br>
><br>
><br>
><br>
><br>
> - Might it be possible to get inttoptr providing such alias analysis<br>
> ?<br>
> It doesn't make a lot of sense to try in most cases.<br>
> Most of the cases ptrtoint/inttoptr is useful are those where you<br>
> want to do crazy things to the pointer.<br>
><br>
><br>
><br>
><br>
><br>
><br>
> - I came across MemorySSA while browsing though the llvm source. Is<br>
> it possible that one can use MemorySSA to do such optimization<br>
> without alias analysis ?<br>
><br>
><br>
> MemorySSA relies on alias analysis to generate the SSA form.<br>
><br>
><br>
><br>
><br>
> - Where do I have to look in the source which is doing this kind of<br>
> optimization (Is it instcombine which uses lib/Analysis/Loads.cpp ?)<br>
><br>
><br>
> It's probably a combination of opts. The most likely candidate is<br>
> -gvn, but I would look at the pass dumps after each opt<br>
><br>
><br>
><br>
><br>
><br>
><br>
> Regards,<br>
> Paul<br>
><br>
><br>
><br>
><br>
><br>
><br>
> 2016-02-10 0:26 GMT+01:00 Philip Reames < <a href="mailto:listmail@philipreames.com">listmail@philipreames.com</a> ><br>
> :<br>
><br>
><br>
><br>
> Two points:<br>
> - Using inttoptr is a mistake here. GEPs are strongly preferred and<br>
> provide strictly more aliasing information to the optimizer.<br>
> - The zext is a bit weird. I'm not sure where that came from, but I'd<br>
> not bother looking into until the preceding point is addressed.<br>
><br>
> In general, you may find these docs useful:<br>
> <a href="http://llvm.org/docs/Frontend/PerformanceTips.html" rel="noreferrer" target="_blank">http://llvm.org/docs/Frontend/PerformanceTips.html</a><br>
><br>
> Philip<br>
><br>
><br>
><br>
><br>
><br>
> On 02/08/2016 06:54 AM, Paul Peet via llvm-dev wrote:<br>
><br>
><br>
><br>
><br>
><br>
> Hello,<br>
><br>
><br>
> I am trying to emulate the "stack" as like on x86 when using push/pop<br>
> so afterwards I can use LLVM's optimizer passes to simplify (reduce<br>
> junk) the code.<br>
><br>
><br>
> The LLVM IR code:<br>
><br>
><br>
><br>
> define { i32, i32, i32 } @test(i32 %foo, i32 %bar, i32 %sp) {<br>
> ; push foo (On "stack")<br>
> %sp_1 = sub i32 %sp, 4<br>
> %sp_1_ptr = inttoptr i32 %sp_1 to i32*<br>
> store i32 %foo, i32* %sp_1_ptr, align 4<br>
><br>
><br>
> ; push bar<br>
> %sp_2 = sub i32 %sp_1, 4<br>
> %sp_2_ptr = inttoptr i32 %sp_2 to i32*<br>
> store i32 %bar, i32* %sp_2_ptr, align 4<br>
><br>
><br>
> ; val1 = pop (val1 = bar)<br>
> %sp_3_ptr = inttoptr i32 %sp_2 to i32*<br>
> %val1 = load i32, i32* %sp_3_ptr, align 4<br>
> %sp_3 = add i32 %sp_2, 4<br>
><br>
><br>
> ; val2 = pop (val2 = foo)<br>
> %sp_4_ptr = inttoptr i32 %sp_3 to i32*<br>
> %val2 = load i32, i32* %sp_4_ptr, align 4<br>
> %sp_4 = add i32 %sp_3, 4<br>
><br>
><br>
> %ret_1 = insertvalue { i32, i32, i32 } undef, i32 %val1, 0<br>
> %ret_2 = insertvalue { i32, i32, i32 } %ret_1, i32 %val2, 1<br>
> %ret_3 = insertvalue { i32, i32, i32 } %ret_2, i32 %sp_4, 2<br>
><br>
><br>
> ret { i32, i32, i32 } %ret_3<br>
> }<br>
><br>
><br>
> This code will "push" two values onto the stack and pop them in<br>
> reverse order so afterwards "foo" and "bar" will be swapped and<br>
> returned back.<br>
><br>
><br>
> After running this through "opt -O2 ./test.ll", I am getting this:<br>
><br>
><br>
><br>
> define { i32, i32, i32 } @test(i32 %foo, i32 %bar, i32 %sp) #0 {<br>
> %sp_1 = add i32 %sp, -4<br>
> %1 = zext i32 %sp_1 to i64<br>
> %sp_1_ptr = inttoptr i64 %1 to i32*<br>
> store i32 %foo, i32* %sp_1_ptr, align 4<br>
> %sp_2 = add i32 %sp, -8<br>
> %2 = zext i32 %sp_2 to i64<br>
> %sp_2_ptr = inttoptr i64 %2 to i32*<br>
> store i32 %bar, i32* %sp_2_ptr, align 4<br>
> %val2 = load i32, i32* %sp_1_ptr, align 4<br>
> %ret_1 = insertvalue { i32, i32, i32 } undef, i32 %bar, 0 ; Swapped<br>
> %ret_2 = insertvalue { i32, i32, i32 } %ret_1, i32 %val2, 1; Not<br>
> Swapped (Not optimized; Should be %foo)<br>
> %ret_3 = insertvalue { i32, i32, i32 } %ret_2, i32 %sp, 2<br>
> ret { i32, i32, i32 } %ret_3<br>
> }<br>
><br>
><br>
> As you can see that the IR has got additional code, eg. zext. But the<br>
> main problem here is that val2 hasn't been optimized.<br>
> Could anyone show me some hints what is preventing the second val<br>
> from being optimized? (My guess would be the zext because I am using<br>
> %sp as a 32bit pointer although the "target" is 64bit).<br>
><br>
><br>
> Regards,<br>
> Paul<br>
><br>
> _______________________________________________<br>
> LLVM Developers mailing list <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>
> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
><br>
><br>
> _______________________________________________<br>
> LLVM Developers mailing list<br>
> <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>
> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
><br>
><br>
><br>
><br>
> _______________________________________________<br>
> LLVM Developers mailing list<br>
> <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>
> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
><br>
<br>
</div></div><div class=""><div class="h5">--<br>
Hal Finkel<br>
Assistant Computational Scientist<br>
Leadership Computing Facility<br>
Argonne National Laboratory<br>
</div></div></blockquote></div><br></div></div>