<div dir="ltr"><div class="gmail_quote">On Wed, Apr 8, 2015 at 10:15 PM Lang Hames <<a href="mailto:lhames@gmail.com">lhames@gmail.com</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi David, Chandler,<div><br></div><div>The attached patch guts the SimplifyMemTransfer method, which turns small memcpys (1/2/4/8 bytes) into load/store pairs. Turning memcpys into load/store pairs loses information, since we can no longer assume the source and dest are non-overlapping. This is leading to some suboptimal expansions for small memcpys on AArch64 when -mno-unaligned-access is turned on (see r234462). I suspect other architectures would suffer similar issues.</div><div><br></div><div>I assume this transform is an old workaround to simplify other non-memcpy-aware IR transforms. These days I think most IR transforms can reason sensibly about memcpys, so I'm hoping this is safe to remove. FWIW, removing it didn't hit any regression tests except those that were verifying that this optimisation was being applied, but then you wouldn't really expect it to hit any others.</div></div></blockquote><div><br></div><div>Heh. I tried to remove it before and it regressed a *lot* of performance. Have you measured it? I think there are many places that don't today reason about memcpy but do reason about loads and stores. Here is a partial list:</div><div><br></div><div>- GVN</div><div>- ValueTracking.cpp's available loaded value (or whatever its called) which drives load combining and store-to-load forwarding throughout instcombine and the IR</div><div>- EarlyCSE</div><div>- LoopVectorize</div><div><br></div><div>I thought about fixing all of this, but it seems really complicated and to have very little value. Loads and stores and SSA values are really useful. Do you see any other way to solve the problem of non-overlapping information?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><br></div><div>If this transform really is useful then we should probably revisit the cut-off: 8-bytes isn't much these days.</div></div></blockquote><div><br></div><div>Yea, this has been kind of horrible. I think the correct heuristic would be when the size is one for which we have a legal integer type.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div> Perhaps we should also only apply it if the alignment on the memcpy is sufficiently high?</div></div></blockquote><div><br></div><div>Is an under aligned memcpy really that much better than an under aligned load and store??? <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><br></div><div>Cheers,</div><div>Lang.</div><div><br></div></div>

______________________________<u></u>_________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/<u></u>mailman/listinfo/llvm-commits</a><br>

</blockquote></div></div>