<div dir="ltr">I dug a bit more. It appears the succession -memcpyopt -instcombine can convert this:<div><br></div><div>
<p><font face="monospace, monospace">%struct.Str = type { i32, i32, i32, i32, i32, i32 }</font></p>
<p><span style="font-family:monospace,monospace">define void @_Z4test3Str(%struct.Str* byval align </span><span style="font-family:monospace,monospace">8</span><span style="font-family:monospace,monospace"> %s) {</span></p><p><span style="font-family:monospace,monospace">entry:</span></p>
<p><font face="monospace, monospace"> %agg.tmp = alloca %struct.Str, align <span>8</span></font></p>
<p><font face="monospace, monospace"> %<span>0</span> = bitcast %struct.Str* %agg.tmp to i8*</font></p>
<p><font face="monospace, monospace"> %<span>1</span> = bitcast %struct.Str* %s to i8*</font></p>
<p><font face="monospace, monospace"> <span>call</span> void @llvm.memcpy.p0i8.p0i8.i64(i8* %<span>0</span>, i8* %<span>1</span>, i64 <span>24</span>, i32 <span>4</span>, i1 false)</font></p>
<p><font face="monospace, monospace"> <span>call</span> void @_Z6e_test3Str(%struct.Str* byval align <span>8</span> %agg.tmp)</font></p>
<p><font face="monospace, monospace"> ret void</font></p>
<p><font face="monospace, monospace">}</font></p><p>Into this:<br></p><div>
<p class=""><font face="monospace, monospace">define void @_Z4test3Str(%struct.Str* byval align 8 %s) {</font></p>
<p class=""><font face="monospace, monospace">entry:</font></p>
<p class=""><font face="monospace, monospace"> call void @_Z6e_test3Str(%struct.Str* byval align 8 %s)</font></p>
<p class=""><font face="monospace, monospace"> ret void</font></p>
<p class=""><font face="monospace, monospace">}</font></p><p class=""><font face="monospace, monospace"><br></font></p><p class=""><font face="arial, helvetica, sans-serif">Which is great. This isn't however happening with a GEP and load/store - based IR (so a total of 6 sets of GEP on %s, load, then GEP on %agg.tmp + store , like the one discussed earlier in this thread).</font></p><p class="">I see 2 options:</p><p class="">1) convert the pass I'm working on to produce memcpy instead of load/store successions, which would allow the resulting IR to fit in the canonical patterns optimized today, or</p><p class="">2) add support (probably to memcpyopt) for converting load/store successions into memcpy, then let the current optimizations reduce the resulting IR.</p></div><div><br></div><div>I'm looking for feedback as to which path to take. Are there known instances of successive load/store that would benefit from being replaced with memcpy (option 2)?</div><div><br></div><div>Thank you,</div><div>Mircea.</div><div><br></div><div><br><div class="gmail_quote">On Sun, Mar 8, 2015 at 10:02 AM Mircea Trofin <<a href="mailto:mtrofin@google.com" target="_blank">mtrofin@google.com</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr">errata: I am on 3.6 full stop. I <i>thought</i> there was a 3.7 available, based on the title of <a href="http://llvm.org/docs/" target="_blank">http://llvm.org/docs/</a> ("LLVM 3.7 documentation"). I suppose the docs are ahead of the release schedule?<br></div><br><div class="gmail_quote">On Sun, Mar 8, 2015 at 9:44 AM Mircea Trofin <<a href="mailto:mtrofin@google.com" target="_blank">mtrofin@google.com</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr">Sorry, that phase is part of the PNaCl toolchain. This would be LLVM 3.6, would your comments still apply?<br><br><div>I tried -O3 to no avail. I suppose I'll get llvm 3.7, see if I can optimize the latest snippet there (the one avoiding load/store), and see from there.</div><div><br></div><div>Thanks!</div></div><br><div class="gmail_quote">On Fri, Mar 6, 2015 at 12:01 PM Philip Reames <<a href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">
<br>
<div>On 03/05/2015 06:16 PM, Mircea Trofin
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>Thanks!</div>
<div><br>
</div>
Philip, do you mean I should transform the original IR to
something like this? </div>
</blockquote>
<br></div><div bgcolor="#FFFFFF" text="#000000">
Yes.
</div><div bgcolor="#FFFFFF" text="#000000"><blockquote type="cite">
<div dir="ltr">(...which is what -expand-struct-regs can do, when
applied to my original input)<br>
</div>
</blockquote></div><div bgcolor="#FFFFFF" text="#000000">
Sorry, what? This doesn't appear to be a pass in ToT. Are you
using an older version of LLVM? If so, none of my comments will
apply. <br></div><div bgcolor="#FFFFFF" text="#000000">
<blockquote type="cite">
<div dir="ltr"><br>
<div><font face="monospace, monospace">define void
@main(%struct* byval %ptr) {</font></div>
<div><font face="monospace, monospace"> %val.index =
getelementptr %struct* %ptr, i32 0, i32 0</font></div>
<div><font face="monospace, monospace"> %val.field = load i32*
%val.index</font></div>
<div><font face="monospace, monospace"> %val.index1 =
getelementptr %struct* %ptr, i32 0, i32 1</font></div>
<div><font face="monospace, monospace"> %val.field2 = load i32*
%val.index1</font></div>
<div><font face="monospace, monospace"> %val.ptr = alloca
%struct</font></div>
<div><font face="monospace, monospace"> %val.ptr.index =
getelementptr %struct* %val.ptr, i32 0, i32 0</font></div>
<div><font face="monospace, monospace"> store i32 %val.field,
i32* %val.ptr.index</font></div>
<div><font face="monospace, monospace"> %val.ptr.index4 =
getelementptr %struct* %val.ptr, i32 0, i32 1</font></div>
<div><font face="monospace, monospace"> store i32 %val.field2,
i32* %val.ptr.index4</font></div>
<div><font face="monospace, monospace"> call void
@extern_func(%struct* byval %val.ptr)</font></div>
<div><font face="monospace, monospace"> ret void</font></div>
<div><font face="monospace, monospace">}</font></div>
<div><br>
</div>
<div>If so, would you mind pointing me to the phase that would
reduce this? (I'm assuming that's what you meant by "for free"
- there's an existing phase I could use)</div>
</div>
</blockquote></div><div bgcolor="#FFFFFF" text="#000000">
I would expect GVN to get this. If you can run this through a fully
-O3 pass order and get the right result, isolating the pass in
question should be easy. <br></div><div bgcolor="#FFFFFF" text="#000000">
<blockquote type="cite">
<div dir="ltr">
<div><br>
</div>
<div>Thank you.</div>
<div>Mircea.</div>
<div> </div>
<br>
<div class="gmail_quote">On Thu, Mar 5, 2015 at 4:39 PM Philip
Reames <<a href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>Reid is right that this would go in memcpyopt, but...
we there's an active discussion on the commit list which
will solve this through a different mechanism. There's
an active desire to avoid teaching GVN and related
pieces (of which memcpyopt is one) about first class
aggregates. We don't have enough active users of the
feature to justify and maintain the complexity. <br>
<br>
If you haven't already seen it, this background may
help:
<a href="http://llvm.org/docs/Frontend/PerformanceTips.html#avoid-loads-and-stores-of-large-aggregate-type" target="_blank">http://llvm.org/docs/Frontend/<u></u><u></u><u></u>PerformanceTips.html#avoid-<u></u>loa<u></u><u></u>ds-and-stores-of-large-<u></u>aggrega<u></u><u></u>te-type</a><br>
<br>
The current proposal is to convert such aggregate loads
and stores into their component pieces. If that
happens, you're example should come "for free" provided
that the same example works when you break down the FCA
into it's component pieces. If it doesn't, please say
so. <br>
</div>
</div>
<div bgcolor="#FFFFFF" text="#000000">
<div> <br>
Philip</div>
</div>
<div bgcolor="#FFFFFF" text="#000000">
<div><br>
<br>
On 03/05/2015 04:21 PM, Reid Kleckner wrote:<br>
</div>
</div>
<div bgcolor="#FFFFFF" text="#000000">
<blockquote type="cite">
<div dir="ltr">I think lib/Transforms/Scalar/<u></u>MemCpyOp<u></u><u></u>timizer.cpp
might be the right place for this, considering that
most frontends will use memcpy for that copy anyway.
It already has some logic for byval args.</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Mar 5, 2015 at 3:51
PM, Mircea Trofin <span dir="ltr"><<a href="mailto:mtrofin@google.com" target="_blank">mtrofin@google.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div dir="ltr">Hello all,
<div><br>
</div>
<div>I'm trying to find the pass that would
convert from:</div>
<div>
<div><br>
</div>
<div><font face="monospace, monospace">define
void @main(%struct* byval %ptr) {</font></div>
<div><font face="monospace, monospace"> %val
= load %struct* %ptr</font></div>
<div><font face="monospace, monospace">
%val.ptr = alloca %struct</font></div>
<div><font face="monospace, monospace"> store
%struct %val, %struct* %val.ptr</font></div>
<div><font face="monospace, monospace"> call
void @extern_func(%struct* byval %val.ptr)</font></div>
<div><font face="monospace, monospace"> ret
void</font></div>
<div><font face="monospace, monospace">}</font></div>
</div>
<div><br>
</div>
<div>to this:</div>
<div>
<div><font face="monospace, monospace">define
void @main(%struct* byval %ptr) {</font></div>
<div><font face="monospace, monospace"> call
void @extern_func(%struct* byval %ptr)</font></div>
<div><font face="monospace, monospace"> ret
void</font></div>
<div><font face="monospace, monospace">}</font></div>
</div>
<div><br>
</div>
<div>First, am I missing something - would this
be a correct optimization?</div>
<div><br>
</div>
<div>Thank you,</div>
<div>Mircea.</div>
</div>
<br>
______________________________<u></u><u></u><u></u>_________________<br>
LLVM Developers mailing list<br>
<a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/<u></u>mailm<u></u><u></u>an/listinfo/llvmdev</a><br>
<br>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset></fieldset>
<br>
<pre>______________________________<u></u><u></u><u></u>_________________
LLVM Developers mailing list
<a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/<u></u>mailm<u></u><u></u>an/listinfo/llvmdev</a>
</pre>
</blockquote>
<br>
</div>
</blockquote>
</div>
</div>
</blockquote>
<br>
</div></blockquote></div></blockquote></div></blockquote></div></div></div></div>