<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">2016-02-10 22:23 GMT+01:00 Hal Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span>:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span class="">----- Original Message -----<br>

> From: "Paul Peet via llvm-dev" <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>><br>

> To: "Daniel Berlin" <<a href="mailto:dberlin@dberlin.org">dberlin@dberlin.org</a>><br>

> Cc: "llvm-dev" <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>><br>

</span><span class="">> Sent: Wednesday, February 10, 2016 3:13:15 PM<br>

> Subject: Re: [llvm-dev] Memory Store/Load Optimization Issue (Emulating       stack)<br>

><br>

><br>

><br>

</span><span class="">> Thanks for the answers. Although I am not sure if I've understood the<br>

> docs about how inttoptr/ptrtointr are different when compared to<br>

> gep.<br>

> It says: "It’s invalid to take a GEP from one object, address into a<br>

> different separately allocated object, and dereference it.".<br>

<br>

</span>This refers to the underlying allocation that created the memory. Where did %sp come from? Is it an alloca instruction, or from some other source?<br>

<span class=""><br></span></blockquote><div><br></div><div>It's allocated via malloc and passed to the function.</div><div><div>define { i32, i32, i8* } @test(i32 %foo, i32 %bar, i8* %sp_x)</div></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span class="">

> To go back to my intention why I am doing this, I would like to<br>

> "emulate" some x86 instructions with llvm-ir but as far as I<br>

> understand that aliasing rule, I am not sure if I am breaking that<br>

> rule.<br>

><br>

><br>

> For example when translating this x86 code to llvm ir:<br>

><br>

><br>

> push eax<br>

> add esp, 2<br>

> push ecx<br>

> ...<br>

><br>

><br>

><br>

> ; push foo (On "stack")<br>

> %sp_1 = getelementptr i8, i8* %sp, i32 -4<br>

> %sp_1_ptr = bitcast i8* %sp_1 to i32*<br>

> store i32 %foo, i32* %sp_1_ptr, align 4<br>

><br>

><br>

> %sp_x = getelementptr i8, i8* %sp_1, i32 2<br>

><br>

><br>

> ; push bar<br>

> %sp_2 = getelementptr i8, i8* %sp_x, i32 -4<br>

> %sp_2_ptr = bitcast i8* %sp_2 to i32*<br>

> store i32 %bar, i32* %sp_2_ptr, align 4<br>

><br>

><br>

> Both objects (eax, ecx) will overlap because of the size difference<br>

> (eax = i32). What are the consequences when doing this. Will this<br>

> break alias analysis for the further instructions?<br>

><br>

<br>

</span>Partially overlapping writes to do not, in themselves, break anything. AA should handle that just fine.<br>

<span class=""><font color="#888888"><br></font></span></blockquote><div><br></div><div>What do you mean by partially?</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span class=""><font color="#888888">

 -Hal<br>

</font></span><div class=""><div class="h5"><br>

><br>

> 2016-02-10 21:24 GMT+01:00 Daniel Berlin < <a href="mailto:dberlin@dberlin.org">dberlin@dberlin.org</a> > :<br>

><br>

><br>

><br>

><br>

><br>

><br>

><br>

><br>

><br>

> On Wed, Feb 10, 2016 at 12:18 PM, Paul Peet via llvm-dev <<br>

> <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a> > wrote:<br>

><br>

><br>

><br>

> Thank you for the hint.<br>

><br>

><br>

> I adjusted the code and it works:<br>

><br>

><br>

> The code after replacing inttoptr with getelementptr:<br>

><br>

><br>

><br>

> define { i32, i32, i8* } @test(i32 %foo, i32 %bar, i8* %sp) {<br>

> entry:<br>

> ; push foo (On "stack")<br>

> %sp_1 = getelementptr i8, i8* %sp, i32 -4<br>

> %sp_1_ptr = bitcast i8* %sp_1 to i32*<br>

> store i32 %foo, i32* %sp_1_ptr, align 4<br>

><br>

><br>

> ; push bar<br>

> %sp_2 = getelementptr i8, i8* %sp_1, i32 -4<br>

> %sp_2_ptr = bitcast i8* %sp_2 to i32*<br>

> store i32 %bar, i32* %sp_2_ptr, align 4<br>

><br>

><br>

> ; val1 = pop (val1 = bar)<br>

> %sp_3_ptr = bitcast i8* %sp_2 to i32*<br>

> %val1 = load i32, i32* %sp_3_ptr, align 4<br>

> %sp_3 = getelementptr i8, i8* %sp_2, i32 4<br>

><br>

><br>

> ; val2 = pop (val2 = foo)<br>

> %sp_4_ptr = bitcast i8* %sp_3 to i32*<br>

> %val2 = load i32, i32* %sp_4_ptr, align 4<br>

> %sp_4 = getelementptr i8, i8* %sp_3, i32 4<br>

><br>

><br>

> %ret_1 = insertvalue { i32, i32, i8* } undef, i32 %val1, 0<br>

> %ret_2 = insertvalue { i32, i32, i8* } %ret_1, i32 %val2, 1<br>

> %ret_3 = insertvalue { i32, i32, i8* } %ret_2, i8* %sp_4, 2<br>

><br>

><br>

> ret { i32, i32, i8* } %ret_3<br>

> }<br>

><br>

><br>

> After optimization ("opt -instcombine ./code.ll -S")<br>

><br>

><br>

><br>

> define { i32, i32, i8* } @test(i32 %foo, i32 %bar, i8* %sp) {<br>

> entry:<br>

> %sp_1 = getelementptr i8, i8* %sp, i64 -4<br>

> %sp_1_ptr = bitcast i8* %sp_1 to i32*<br>

> store i32 %foo, i32* %sp_1_ptr, align 4<br>

> %sp_2 = getelementptr i8, i8* %sp, i64 -8<br>

> %sp_2_ptr = bitcast i8* %sp_2 to i32*<br>

> store i32 %bar, i32* %sp_2_ptr, align 4<br>

> %ret_1 = insertvalue { i32, i32, i8* } undef, i32 %bar, 0<br>

> %ret_2 = insertvalue { i32, i32, i8* } %ret_1, i32 %foo, 1<br>

> %ret_3 = insertvalue { i32, i32, i8* } %ret_2, i8* %sp, 2<br>

> ret { i32, i32, i8* } %ret_3<br>

> }<br>

><br>

><br>

> My only questions are now:<br>

> - How is it that inttoptr cannot provide that specific alias<br>

> information so it can optimize that store/load away ?<br>

> Because nothing tracks what happens to the ints, and what happens<br>

> when they are converted back to pointers and whether it's sane :)<br>

> <a href="http://llvm.org/docs/GetElementPtr.html#how-is-gep-different-from-ptrtoint-arithmetic-and-inttoptr" rel="noreferrer" target="_blank">http://llvm.org/docs/GetElementPtr.html#how-is-gep-different-from-ptrtoint-arithmetic-and-inttoptr</a><br>

><br>

><br>

><br>

><br>

><br>

> - Might it be possible to get inttoptr providing such alias analysis<br>

> ?<br>

> It doesn't make a lot of sense to try in most cases.<br>

> Most of the cases ptrtoint/inttoptr is useful are those where you<br>

> want to do crazy things to the pointer.<br>

><br>

><br>

><br>

><br>

><br>

><br>

> - I came across MemorySSA while browsing though the llvm source. Is<br>

> it possible that one can use MemorySSA to do such optimization<br>

> without alias analysis ?<br>

><br>

><br>

> MemorySSA relies on alias analysis to generate the SSA form.<br>

><br>

><br>

><br>

><br>

> - Where do I have to look in the source which is doing this kind of<br>

> optimization (Is it instcombine which uses lib/Analysis/Loads.cpp ?)<br>

><br>

><br>

> It's probably a combination of opts. The most likely candidate is<br>

> -gvn, but I would look at the pass dumps after each opt<br>

><br>

><br>

><br>

><br>

><br>

><br>

> Regards,<br>

> Paul<br>

><br>

><br>

><br>

><br>

><br>

><br>

> 2016-02-10 0:26 GMT+01:00 Philip Reames < <a href="mailto:listmail@philipreames.com">listmail@philipreames.com</a> ><br>

> :<br>

><br>

><br>

><br>

> Two points:<br>

> - Using inttoptr is a mistake here. GEPs are strongly preferred and<br>

> provide strictly more aliasing information to the optimizer.<br>

> - The zext is a bit weird. I'm not sure where that came from, but I'd<br>

> not bother looking into until the preceding point is addressed.<br>

><br>

> In general, you may find these docs useful:<br>

> <a href="http://llvm.org/docs/Frontend/PerformanceTips.html" rel="noreferrer" target="_blank">http://llvm.org/docs/Frontend/PerformanceTips.html</a><br>

><br>

> Philip<br>

><br>

><br>

><br>

><br>

><br>

> On 02/08/2016 06:54 AM, Paul Peet via llvm-dev wrote:<br>

><br>

><br>

><br>

><br>

><br>

> Hello,<br>

><br>

><br>

> I am trying to emulate the "stack" as like on x86 when using push/pop<br>

> so afterwards I can use LLVM's optimizer passes to simplify (reduce<br>

> junk) the code.<br>

><br>

><br>

> The LLVM IR code:<br>

><br>

><br>

><br>

> define { i32, i32, i32 } @test(i32 %foo, i32 %bar, i32 %sp) {<br>

> ; push foo (On "stack")<br>

> %sp_1 = sub i32 %sp, 4<br>

> %sp_1_ptr = inttoptr i32 %sp_1 to i32*<br>

> store i32 %foo, i32* %sp_1_ptr, align 4<br>

><br>

><br>

> ; push bar<br>

> %sp_2 = sub i32 %sp_1, 4<br>

> %sp_2_ptr = inttoptr i32 %sp_2 to i32*<br>

> store i32 %bar, i32* %sp_2_ptr, align 4<br>

><br>

><br>

> ; val1 = pop (val1 = bar)<br>

> %sp_3_ptr = inttoptr i32 %sp_2 to i32*<br>

> %val1 = load i32, i32* %sp_3_ptr, align 4<br>

> %sp_3 = add i32 %sp_2, 4<br>

><br>

><br>

> ; val2 = pop (val2 = foo)<br>

> %sp_4_ptr = inttoptr i32 %sp_3 to i32*<br>

> %val2 = load i32, i32* %sp_4_ptr, align 4<br>

> %sp_4 = add i32 %sp_3, 4<br>

><br>

><br>

> %ret_1 = insertvalue { i32, i32, i32 } undef, i32 %val1, 0<br>

> %ret_2 = insertvalue { i32, i32, i32 } %ret_1, i32 %val2, 1<br>

> %ret_3 = insertvalue { i32, i32, i32 } %ret_2, i32 %sp_4, 2<br>

><br>

><br>

> ret { i32, i32, i32 } %ret_3<br>

> }<br>

><br>

><br>

> This code will "push" two values onto the stack and pop them in<br>

> reverse order so afterwards "foo" and "bar" will be swapped and<br>

> returned back.<br>

><br>

><br>

> After running this through "opt -O2 ./test.ll", I am getting this:<br>

><br>

><br>

><br>

> define { i32, i32, i32 } @test(i32 %foo, i32 %bar, i32 %sp) #0 {<br>

> %sp_1 = add i32 %sp, -4<br>

> %1 = zext i32 %sp_1 to i64<br>

> %sp_1_ptr = inttoptr i64 %1 to i32*<br>

> store i32 %foo, i32* %sp_1_ptr, align 4<br>

> %sp_2 = add i32 %sp, -8<br>

> %2 = zext i32 %sp_2 to i64<br>

> %sp_2_ptr = inttoptr i64 %2 to i32*<br>

> store i32 %bar, i32* %sp_2_ptr, align 4<br>

> %val2 = load i32, i32* %sp_1_ptr, align 4<br>

> %ret_1 = insertvalue { i32, i32, i32 } undef, i32 %bar, 0 ; Swapped<br>

> %ret_2 = insertvalue { i32, i32, i32 } %ret_1, i32 %val2, 1; Not<br>

> Swapped (Not optimized; Should be %foo)<br>

> %ret_3 = insertvalue { i32, i32, i32 } %ret_2, i32 %sp, 2<br>

> ret { i32, i32, i32 } %ret_3<br>

> }<br>

><br>

><br>

> As you can see that the IR has got additional code, eg. zext. But the<br>

> main problem here is that val2 hasn't been optimized.<br>

> Could anyone show me some hints what is preventing the second val<br>

> from being optimized? (My guess would be the zext because I am using<br>

> %sp as a 32bit pointer although the "target" is 64bit).<br>

><br>

><br>

> Regards,<br>

> Paul<br>

><br>

> _______________________________________________<br>

> LLVM Developers mailing list <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>

> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

><br>

><br>

> _______________________________________________<br>

> LLVM Developers mailing list<br>

> <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>

> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

><br>

><br>

><br>

><br>

> _______________________________________________<br>

> LLVM Developers mailing list<br>

> <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>

> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

><br>

<br>

</div></div><div class=""><div class="h5">--<br>

Hal Finkel<br>

Assistant Computational Scientist<br>

Leadership Computing Facility<br>

Argonne National Laboratory<br>

</div></div></blockquote></div><br></div></div>