<div dir="ltr">Hi Chandler,<div><br></div><div>Not as easy as I was hoping then.</div><div><br></div><div>> <span style="font-size:13px">Do you see any other way to solve the problem of non-overlapping information?</span></div><div><br></div><div>I'll have to do some reading. If there's any aliasing metadata that we can attach to express that the pointers are disjoint, that would work: In SelectionDAGBuilder we could detect disjoint, misaligned load/store pairs where the load has no other users and use the memcpy expansion instead.<br></div><div><div><br></div></div><div>> Is an under aligned memcpy really that much better than an under aligned load and store??? <br></div><div class="gmail_extra"><br></div><div class="gmail_extra">It saves a bit of shifting and masking as you try to reconstruct the full iN value in a register. This would be exacerbated if we raised the size cap. I'll see if I can get you some numbers.</div><div class="gmail_extra"><br></div><div class="gmail_extra">- Lang.</div><div class="gmail_extra"><br></div><div class="gmail_extra"><div class="gmail_quote">On Wed, Apr 8, 2015 at 10:38 PM, Chandler Carruth <span dir="ltr"><<a href="mailto:chandlerc@gmail.com" target="_blank">chandlerc@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><span class="">On Wed, Apr 8, 2015 at 10:15 PM Lang Hames <<a href="mailto:lhames@gmail.com" target="_blank">lhames@gmail.com</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr">Hi David, Chandler,<div><br></div><div>The attached patch guts the SimplifyMemTransfer method, which turns small memcpys (1/2/4/8 bytes) into load/store pairs. Turning memcpys into load/store pairs loses information, since we can no longer assume the source and dest are non-overlapping. This is leading to some suboptimal expansions for small memcpys on AArch64 when -mno-unaligned-access is turned on (see r234462). I suspect other architectures would suffer similar issues.</div><div><br></div><div>I assume this transform is an old workaround to simplify other non-memcpy-aware IR transforms. These days I think most IR transforms can reason sensibly about memcpys, so I'm hoping this is safe to remove. FWIW, removing it didn't hit any regression tests except those that were verifying that this optimisation was being applied, but then you wouldn't really expect it to hit any others.</div></div></blockquote><div><br></div></span><div>Heh. I tried to remove it before and it regressed a *lot* of performance. Have you measured it? I think there are many places that don't today reason about memcpy but do reason about loads and stores. Here is a partial list:</div><div><br></div><div>- GVN</div><div>- ValueTracking.cpp's available loaded value (or whatever its called) which drives load combining and store-to-load forwarding throughout instcombine and the IR</div><div>- EarlyCSE</div><div>- LoopVectorize</div><div><br></div><div>I thought about fixing all of this, but it seems really complicated and to have very little value. Loads and stores and SSA values are really useful. Do you see any other way to solve the problem of non-overlapping information?</div><span class=""><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div><br></div><div>If this transform really is useful then we should probably revisit the cut-off: 8-bytes isn't much these days.</div></div></blockquote><div><br></div></span><div>Yea, this has been kind of horrible. I think the correct heuristic would be when the size is one for which we have a legal integer type.</div><span class=""><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div> Perhaps we should also only apply it if the alignment on the memcpy is sufficiently high?</div></div></blockquote><div><br></div></span><div>Is an under aligned memcpy really that much better than an under aligned load and store??? <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div><br></div><div>Cheers,</div><div>Lang.</div><div><br></div></div>

______________________________<u></u>_________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/<u></u>mailman/listinfo/llvm-commits</a><br>

</blockquote></div></div>

</blockquote></div><br></div></div>