<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Apr 1, 2015 at 5:32 PM, Mircea Trofin <span dir="ltr"><<a href="mailto:mtrofin@google.com" target="_blank">mtrofin@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I dug a bit more. It appears the succession -memcpyopt -instcombine can convert this:<div><br></div><div>


<p><font face="monospace, monospace">%struct.Str = type { i32, i32, i32, i32, i32, i32 }</font></p>

<p><span style="font-family:monospace,monospace">define void @_Z4test3Str(%struct.Str* byval align </span><span style="font-family:monospace,monospace">8</span><span style="font-family:monospace,monospace"> %s) {</span></p><p><span style="font-family:monospace,monospace">entry:</span></p>

<p><font face="monospace, monospace">  %agg.tmp = alloca %struct.Str, align <span>8</span></font></p>

<p><font face="monospace, monospace">  %<span>0</span> = bitcast %struct.Str* %agg.tmp to i8*</font></p>

<p><font face="monospace, monospace">  %<span>1</span> = bitcast %struct.Str* %s to i8*</font></p>

<p><font face="monospace, monospace">  <span>call</span> void @llvm.memcpy.p0i8.p0i8.i64(i8* %<span>0</span>, i8* %<span>1</span>, i64 <span>24</span>, i32 <span>4</span>, i1 false)</font></p>

<p><font face="monospace, monospace">  <span>call</span> void @_Z6e_test3Str(%struct.Str* byval align <span>8</span> %agg.tmp)</font></p>

<p><font face="monospace, monospace">  ret void</font></p>

<p><font face="monospace, monospace">}</font></p><p>Into this:<br></p><div>


<p><font face="monospace, monospace">define void @_Z4test3Str(%struct.Str* byval align 8 %s) {</font></p>

<p><font face="monospace, monospace">entry:</font></p>

<p><font face="monospace, monospace">  call void @_Z6e_test3Str(%struct.Str* byval align 8 %s)</font></p>

<p><font face="monospace, monospace">  ret void</font></p>

<p><font face="monospace, monospace">}</font></p><p><font face="monospace, monospace"><br></font></p><p><font face="arial, helvetica, sans-serif">Which is great. This isn't however happening with a GEP and load/store - based IR (so a total of 6 sets of GEP on %s, load, then GEP on %agg.tmp + store , like the one discussed earlier in this thread).</font></p><p>I see 2 options:</p><p>1) convert the pass I'm working on to produce memcpy instead of load/store successions, which would allow the resulting IR to fit in the canonical patterns optimized today, or</p></div></div></div></blockquote><div>I'd say that if you are copying an object and it requires more than 2 loads and stores, use memcpy. This is what Clang does for aggregate copies when there is no copy ctor. </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><p>2) add support (probably to memcpyopt) for converting load/store successions into memcpy, then let the current optimizations reduce the resulting IR.</p></div></div></div></blockquote><div>We should do this as a separate pass (I thought we did?), but it's hard to do when there is interior padding in the struct. It's hard to know if the interior padding of the destination needs to retain the data that was originally there.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div>I'm looking for feedback as to which path to take. Are there known instances of successive load/store that would benefit from being replaced with memcpy (option 2)?</div></div></div></blockquote></div></div></div>