<div dir="ltr">Hi Hal, Chandler,<div><br></div><div>Hal - thanks for the advice. An updated InstCombine patch is attached that avoids adding the redundant noalias info.

</div><div><br></div><div>Two other patches are also attached: </div><div>(1) A patch to move the AliasAnalysis member and isAlias method from DAGCombiner to SelectionDAG. This makes alias analysis available to custom combines, and...</div><div>(2) a demo patch for AArch64 that uses this to transform misaligned-load/store pairs into memcpys.</div><div><div><br></div><div>I think it makes sense to do this kind of transformation on a per-target basis. I originally tried to make this a generic transformation at DAG-building time, but saw a lot of regression test failures. I started adding dials and switches to allow the targets to customize when it triggered, but it quickly became apparent that it would be easier to just let the targets do the combine themselves.</div><div><br></div><div>So - does this InstCombine change look sane?

</div><div><br></div><div>The SelectionDAG and AArch64 changes I'm reasonably confident about, though feedback is always welcome.</div><div><br></div><div>Cheers,</div><div>Lang.</div><div><br><div class="gmail_extra"><div class="gmail_quote">On Thu, Apr 9, 2015 at 7:21 PM, Hal Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span>----- Original Message -----<br>

> From: "Lang Hames" <<a href="mailto:lhames@gmail.com" target="_blank">lhames@gmail.com</a>><br>

> To: "Chandler Carruth" <<a href="mailto:chandlerc@gmail.com" target="_blank">chandlerc@gmail.com</a>><br>

> Cc: "Commit Messages and Patches for LLVM" <<a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a>><br>

> Sent: Thursday, April 9, 2015 5:18:56 PM<br>

> Subject: Re: [PATCH] Gut InstCombiner::SimplifyMemTransfer.<br>

><br>

><br>

><br>

> Hi Chandler,<br>

><br>

><br>

> How about attaching noalias metadata to the load/store pointers when<br>

> we simplify memcpy. Then the backend could use noalias info to<br>

> reconstitute memcpys for misaligned load/store pairs.<br>

><br>

><br>

> Tentative patch attached - I'm still trying to wrap my head around<br>

> the noalias metadata stuff, but I think this is heading in the right<br>

> direction.<br>

<br>

</span>That should indeed mark the source and destination accesses as mutually-non-aliasing. It is a bit overkill, because you don't need to explicitly mark both directions (if A does not alias with B, then B does not alias with A); ScopedNoAliasAA already checks both directions (in ScopedNoAliasAA.cpp):<br>

<br>

  if (!mayAliasInScopes(AScopes, BNoAlias))<br>

    return NoAlias;<br>

<br>

  if (!mayAliasInScopes(BScopes, ANoAlias))<br>

    return NoAlias;<br>

<br>

If you want to use AA during CodeGen (by which we include SDAG), the target will need to return true from useAA().<br>

<br>

 -Hal<br>

<div><div><br>

><br>

><br>

> - Lang.<br>

><br>

><br>

><br>

> On Wed, Apr 8, 2015 at 11:29 PM, Lang Hames < <a href="mailto:lhames@gmail.com" target="_blank">lhames@gmail.com</a> ><br>

> wrote:<br>

><br>

><br>

><br>

> > Is the problem specific to misaligned loads and stores? Because<br>

> > that seems much easier to solve.<br>

><br>

><br>

> Mostly, but with a twist: Misaligned loads/stores are fine if your<br>

> target supports them (I don't imagine they're a problem on x86), but<br>

> as far as I know we don't have a way to express to the mid-level<br>

> optimisers whether a misaligned load/store will be *bad* on the<br>

> current target CPU (in the sense that SelectionDAG will have to<br>

> expand it to something much larger). The relevant information is<br>

> currently on TargetLowering. Maybe we need to move it somewhere more<br>

> accessible.<br>

><br>

><br>

> > But why does lowering as memcpy help? Essentially, I don't<br>

> > understand why we can't use exactly the same lowering strategy<br>

> > that memcpy (or memmove for that matter) would use and get the<br>

> > same effect.<br>

><br>

><br>

> As far as I can see the only way memcpy helps is by conveying that<br>

> the source and dest are non-overlapping. (None of this applies to<br>

> memmove - as far as I know that function can be converted to a<br>

> load/store pair with no loss of information).<br>

> Take the simple case where memcpy lowering is just going to issue a<br>

> series of load-byte / store-byte instructions: That's not a legal<br>

> way to to lower an arbitrary load/store pair, since in the later<br>

> case the source and dest may overlap, but it is legal for a memcpy.<br>

><br>

><br>

> > FWIW, my suggestion about using legal integer types should not<br>

> > raise the cap at all, it should lower it on specific targets where<br>

> > we can't actually fit an 8-byte integer in a register.<br>

><br>

><br>

> Oh, right. Sorry - I mis-parsed that as legal primitive type, and<br>

> imagined applying it to vector types. That raises an interesting<br>

> question though: Would it be useful to apply this logic to vector<br>

> types too?<br>

><br>

><br>

> Cheers,<br>

> Lang.<br>

><br>

><br>

><br>

><br>

> On Wed, Apr 8, 2015 at 11:11 PM, Chandler Carruth <<br>

> <a href="mailto:chandlerc@gmail.com" target="_blank">chandlerc@gmail.com</a> > wrote:<br>

><br>

><br>

><br>

><br>

> On Wed, Apr 8, 2015 at 10:57 PM Lang Hames < <a href="mailto:lhames@gmail.com" target="_blank">lhames@gmail.com</a> ><br>

> wrote:<br>

><br>

><br>

><br>

> Hi Chandler,<br>

><br>

><br>

> Not as easy as I was hoping then.<br>

><br>

><br>

><br>

> > Do you see any other way to solve the problem of non-overlapping<br>

> > information?<br>

><br>

><br>

><br>

> I'll have to do some reading. If there's any aliasing metadata that<br>

> we can attach to express that the pointers are disjoint, that would<br>

> work: In SelectionDAGBuilder we could detect disjoint, misaligned<br>

> load/store pairs where the load has no other users and use the<br>

> memcpy expansion instead.<br>

><br>

><br>

><br>

> Is the problem specific to misaligned loads and stores? Because that<br>

> seems much easier to solve.<br>

><br>

><br>

><br>

><br>

><br>

><br>

><br>

><br>

><br>

> > Is an under aligned memcpy really that much better than an under<br>

> > aligned load and store???<br>

><br>

><br>

><br>

><br>

> It saves a bit of shifting and masking as you try to reconstruct the<br>

> full iN value in a register.<br>

><br>

><br>

> But why does lowering as memcpy help? Essentially, I don't understand<br>

> why we can't use exactly the same lowering strategy that memcpy (or<br>

> memmove for that matter) would use and get the same effect.<br>

><br>

><br>

> If your concern is code size, I'm honestly still surprised that<br>

> memcpy is smaller, but fine, emit the call *whenever* you end up<br>

> with misaligned loads and stores?<br>

><br>

><br>

><br>

><br>

> This would be exacerbated if we raised the size cap. I'll see if I<br>

> can get you some numbers.<br>

><br>

><br>

> FWIW, my suggestion about using legal integer types should not raise<br>

> the cap at all, it should lower it on specific targets where we<br>

> can't actually fit an 8-byte integer in a register.<br>

><br>

><br>

> Anyways, I don't think we need numbers if the primary concern is<br>

> misaligned loads and stores. I just think that there has to be<br>

> *some* lowering strategy that works in general for misaligned loads<br>

> and stores and is no worse than calling memcpy.<br>

><br>

><br>

</div></div><div><div>> _______________________________________________<br>

> llvm-commits mailing list<br>

> <a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a><br>

> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

><br>

<br>

</div></div><span><font color="#888888">--<br>

Hal Finkel<br>

Assistant Computational Scientist<br>

Leadership Computing Facility<br>

Argonne National Laboratory<br>

</font></span></blockquote></div><br></div></div></div></div>