[PATCH] Gut InstCombiner::SimplifyMemTransfer.

Tue Apr 21 20:39:38 PDT 2015

Hi Hal, Chandler,

Hal - thanks for the advice. An updated InstCombine patch is attached that
avoids adding the redundant noalias info.

Two other patches are also attached: 
(1) A patch to move the AliasAnalysis member and isAlias method from
DAGCombiner to SelectionDAG. This makes alias analysis available to custom
combines, and...
(2) a demo patch for AArch64 that uses this to transform
misaligned-load/store pairs into memcpys.

I think it makes sense to do this kind of transformation on a per-target
basis. I originally tried to make this a generic transformation at
DAG-building time, but saw a lot of regression test failures. I started
adding dials and switches to allow the targets to customize when it
triggered, but it quickly became apparent that it would be easier to just
let the targets do the combine themselves.

So - does this InstCombine change look sane?

The SelectionDAG and AArch64 changes I'm reasonably confident about, though
feedback is always welcome.

Cheers,
Lang.

On Thu, Apr 9, 2015 at 7:21 PM, Hal Finkel <hfinkel at anl.gov> wrote:

> ----- Original Message -----
> > From: "Lang Hames" <lhames at gmail.com>
> > To: "Chandler Carruth" <chandlerc at gmail.com>
> > Cc: "Commit Messages and Patches for LLVM" <llvm-commits at cs.uiuc.edu>
> > Sent: Thursday, April 9, 2015 5:18:56 PM
> > Subject: Re: [PATCH] Gut InstCombiner::SimplifyMemTransfer.
> >
> >
> >
> > Hi Chandler,
> >
> >
> > How about attaching noalias metadata to the load/store pointers when
> > we simplify memcpy. Then the backend could use noalias info to
> > reconstitute memcpys for misaligned load/store pairs.
> >
> >
> > Tentative patch attached - I'm still trying to wrap my head around
> > the noalias metadata stuff, but I think this is heading in the right
> > direction.
>
> That should indeed mark the source and destination accesses as
> mutually-non-aliasing. It is a bit overkill, because you don't need to
> explicitly mark both directions (if A does not alias with B, then B does
> not alias with A); ScopedNoAliasAA already checks both directions (in
> ScopedNoAliasAA.cpp):
>
>   if (!mayAliasInScopes(AScopes, BNoAlias))
>     return NoAlias;
>
>   if (!mayAliasInScopes(BScopes, ANoAlias))
>     return NoAlias;
>
> If you want to use AA during CodeGen (by which we include SDAG), the
> target will need to return true from useAA().
>
>  -Hal
>
> >
> >
> > - Lang.
> >
> >
> >
> > On Wed, Apr 8, 2015 at 11:29 PM, Lang Hames < lhames at gmail.com >
> > wrote:
> >
> >
> >
> > > Is the problem specific to misaligned loads and stores? Because
> > > that seems much easier to solve.
> >
> >
> > Mostly, but with a twist: Misaligned loads/stores are fine if your
> > target supports them (I don't imagine they're a problem on x86), but
> > as far as I know we don't have a way to express to the mid-level
> > optimisers whether a misaligned load/store will be *bad* on the
> > current target CPU (in the sense that SelectionDAG will have to
> > expand it to something much larger). The relevant information is
> > currently on TargetLowering. Maybe we need to move it somewhere more
> > accessible.
> >
> >
> > > But why does lowering as memcpy help? Essentially, I don't
> > > understand why we can't use exactly the same lowering strategy
> > > that memcpy (or memmove for that matter) would use and get the
> > > same effect.
> >
> >
> > As far as I can see the only way memcpy helps is by conveying that
> > the source and dest are non-overlapping. (None of this applies to
> > memmove - as far as I know that function can be converted to a
> > load/store pair with no loss of information).
> > Take the simple case where memcpy lowering is just going to issue a
> > series of load-byte / store-byte instructions: That's not a legal
> > way to to lower an arbitrary load/store pair, since in the later
> > case the source and dest may overlap, but it is legal for a memcpy.
> >
> >
> > > FWIW, my suggestion about using legal integer types should not
> > > raise the cap at all, it should lower it on specific targets where
> > > we can't actually fit an 8-byte integer in a register.
> >
> >
> > Oh, right. Sorry - I mis-parsed that as legal primitive type, and
> > imagined applying it to vector types. That raises an interesting
> > question though: Would it be useful to apply this logic to vector
> > types too?
> >
> >
> > Cheers,
> > Lang.
> >
> >
> >
> >
> > On Wed, Apr 8, 2015 at 11:11 PM, Chandler Carruth <
> > chandlerc at gmail.com > wrote:
> >
> >
> >
> >
> > On Wed, Apr 8, 2015 at 10:57 PM Lang Hames < lhames at gmail.com >
> > wrote:
> >
> >
> >
> > Hi Chandler,
> >
> >
> > Not as easy as I was hoping then.
> >
> >
> >
> > > Do you see any other way to solve the problem of non-overlapping
> > > information?
> >
> >
> >
> > I'll have to do some reading. If there's any aliasing metadata that
> > we can attach to express that the pointers are disjoint, that would
> > work: In SelectionDAGBuilder we could detect disjoint, misaligned
> > load/store pairs where the load has no other users and use the
> > memcpy expansion instead.
> >
> >
> >
> > Is the problem specific to misaligned loads and stores? Because that
> > seems much easier to solve.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > > Is an under aligned memcpy really that much better than an under
> > > aligned load and store???
> >
> >
> >
> >
> > It saves a bit of shifting and masking as you try to reconstruct the
> > full iN value in a register.
> >
> >
> > But why does lowering as memcpy help? Essentially, I don't understand
> > why we can't use exactly the same lowering strategy that memcpy (or
> > memmove for that matter) would use and get the same effect.
> >
> >
> > If your concern is code size, I'm honestly still surprised that
> > memcpy is smaller, but fine, emit the call *whenever* you end up
> > with misaligned loads and stores?
> >
> >
> >
> >
> > This would be exacerbated if we raised the size cap. I'll see if I
> > can get you some numbers.
> >
> >
> > FWIW, my suggestion about using legal integer types should not raise
> > the cap at all, it should lower it on specific targets where we
> > can't actually fit an 8-byte integer in a register.
> >
> >
> > Anyways, I don't think we need numbers if the primary concern is
> > misaligned loads and stores. I just think that there has to be
> > *some* lowering strategy that works in general for misaligned loads
> > and stores and is no worse than calling memcpy.
> >
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150421/5e621022/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: add_memcpy_noalias_info.patch
Type: application/octet-stream
Size: 1760 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150421/5e621022/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: move_isalias_to_selectiondag.patch
Type: application/octet-stream
Size: 19768 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150421/5e621022/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: aarch64_misaligned_load_store_to_memcpy_combine.patch
Type: application/octet-stream
Size: 2690 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150421/5e621022/attachment-0002.obj>