[PATCH] Gut InstCombiner::SimplifyMemTransfer.

Hal Finkel via llvm-commits llvm-commits at lists.llvm.org
Tue Aug 18 17:05:12 PDT 2015


----- Original Message -----
> From: "John McCall" <rjmccall at apple.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "David Majnemer" <david.majnemer at gmail.com>, "Commit Messages and Patches for LLVM"
> <llvm-commits at lists.llvm.org>, "Lang Hames" <lhames at gmail.com>, "Richard Smith" <richard at metafoo.co.uk>, "Chandler
> Carruth" <chandlerc at gmail.com>
> Sent: Tuesday, August 18, 2015 6:57:35 PM
> Subject: Re: [PATCH] Gut InstCombiner::SimplifyMemTransfer.
> 
> 
> > On Aug 16, 2015, at 3:08 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> > 
> > [+Richard, John, Chandler]
> > 
> > ----- Original Message -----
> >> From: "David Majnemer via llvm-commits"
> >> <llvm-commits at lists.llvm.org>
> >> To: "Lang Hames" <lhames at gmail.com>
> >> Cc: "Commit Messages and Patches for LLVM"
> >> <llvm-commits at lists.llvm.org>
> >> Sent: Wednesday, August 12, 2015 10:35:16 PM
> >> Subject: Re: [PATCH] Gut InstCombiner::SimplifyMemTransfer.
> >> 
> >> 
> >> 
> >> Hi Lang,
> >> 
> >> 
> >> I reviewed the InstCombine side and it looks great, all that I ask
> >> is
> >> that you replace the `Instruction *` with `auto *` because the
> >> type
> >> is replicated in the `dyn_cast`.
> >> 
> >> 
> >> My only concern is that clang will actually create calls to the
> >> memcpy intrinsic from within clang with pointers which exactly
> >> alias:
> >> http://clang.llvm.org/doxygen/CGExprAgg_8cpp_source.html#l01448
> >> 
> > 
> > That seems very broken. We need to fix this. In fact, the comment
> > admits as much:
> > 
> > 01454   // memcpy is not defined if the source and destination
> > pointers are exactly
> > 01455   // equal, but other compilers do this optimization, and
> > almost every memcpy
> > 01456   // implementation handles this case safely.  If there is a
> > libc that does not
> > 01457   // safely handle this, we can add a target hook.
> > 
> > Is there any reason not to emit memmove instead?
> 
> Partial overlaps are generally not possible here — C guarantees that
> the overlap
> in an assignment has to be perfect.  So a great deal of the added
> complexity of
> memmove is not needed; and as mentioned, no actual implementations of
> memcpy misbehave for perfect overlap.
> 
> That said, I can appreciate wanting to keep the formal model sane.  I
> suspect that
> emitting llvm.memmove would generally produce uses of the actual
> memmove,
> but I'd be happy to use a different intrinsic that allows only
> perfect overlap and which
> you can make a “platform-specific” decision to lower to memcpy in the
> backend.

The underlying problem is not the library implementation of memcpy itself, it is the aliasing assumptions that the optimizer gets to make in the mean time. The particular case that prompted this is the proposed patch by Lang to add aliasing metadata to the load/store pair used to replace small, aligned memcpy calls.

I'm very sympathetic to not wanting to invoke the heavier-weight memmove when the memcpy will suffice, but we need to change something. Your suggestion to add an intrinsic to represent memcpy_or_perfect_overlap seems like a reasonable compromise to me.

Thanks again,
Hal

> 
> John.
> 
> > 
> > -Hal
> > 
> >> 
> >> I cannot confess to being an alias-analysis expert but my concern
> >> is
> >> that annotating the memcpy as such could create some interesting
> >> paradoxes. Those more familiar with AA should chime in.
> >> 
> >> 
> >> I'm afraid I cannot provide any guidance on the SelectionDAG
> >> changes.
> >> 
> >> 
> >> Thanks
> >> --
> >> David Majnemer
> >> 
> >> 
> >> On Wed, Aug 12, 2015 at 10:06 PM, Lang Hames via llvm-commits <
> >> llvm-commits at lists.llvm.org > wrote:
> >> 
> >> 
> >> 
> >> Another belated *ping*.
> >> 
> >> 
> >> 
> >> - Lang.
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> On Tue, Jun 30, 2015 at 2:16 PM, Lang Hames < lhames at gmail.com >
> >> wrote:
> >> 
> >> 
> >> 
> >> Belated ping. :)
> >> 
> >> 
> >> - Lang.
> >> 
> >> 
> >> 
> >> 
> >> On Tue, May 12, 2015 at 2:11 PM, Lang Hames < lhames at gmail.com >
> >> wrote:
> >> 
> >> 
> >> 
> >> Gentle ping.
> >> 
> >> 
> >> - Lang.
> >> 
> >> 
> >> 
> >> 
> >> On Tue, Apr 28, 2015 at 11:51 AM, Lang Hames < lhames at gmail.com >
> >> wrote:
> >> 
> >> 
> >> 
> >> Ping / delayed reply.
> >> 
> >> 
> >> DannyB - thanks for the extra context. :)
> >> 
> >> 
> >> Chandler - any thoughts? Assuming you still want to keep this
> >> transform, what do you think of my instcombine change? (i.e.
> >> tagging
> >> the load/store pointers as non-aliasing for memcpy).
> >> 
> >> 
> >> Cheers,
> >> Lang.
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> On Tue, Apr 21, 2015 at 9:21 PM, Daniel Berlin <
> >> dberlin at dberlin.org
> >>> wrote:
> >> 
> >> 
> >> On Wed, Apr 8, 2015 at 10:38 PM, Chandler Carruth <
> >> chandlerc at gmail.com > wrote:
> >>> On Wed, Apr 8, 2015 at 10:15 PM Lang Hames < lhames at gmail.com >
> >>> wrote:
> >>>> 
> >>>> Hi David, Chandler,
> >>>> 
> >>>> The attached patch guts the SimplifyMemTransfer method, which
> >>>> turns small
> >>>> memcpys (1/2/4/8 bytes) into load/store pairs. Turning memcpys
> >>>> into
> >>>> load/store pairs loses information, since we can no longer
> >>>> assume
> >>>> the source
> >>>> and dest are non-overlapping. This is leading to some suboptimal
> >>>> expansions
> >>>> for small memcpys on AArch64 when -mno-unaligned-access is
> >>>> turned
> >>>> on (see
> >>>> r234462). I suspect other architectures would suffer similar
> >>>> issues.
> >>>> 
> >>>> I assume this transform is an old workaround to simplify other
> >>>> non-memcpy-aware IR transforms. These days I think most IR
> >>>> transforms can
> >>>> reason sensibly about memcpys, so I'm hoping this is safe to
> >>>> remove. FWIW,
> >>>> removing it didn't hit any regression tests except those that
> >>>> were
> >>>> verifying
> >>>> that this optimisation was being applied, but then you wouldn't
> >>>> really
> >>>> expect it to hit any others.
> >>> 
> >>> 
> >>> Heh. I tried to remove it before and it regressed a *lot* of
> >>> performance.
> >>> Have you measured it? I think there are many places that don't
> >>> today reason
> >>> about memcpy but do reason about loads and stores. Here is a
> >>> partial list:
> >>> 
> >>> - GVN
> >> 
> >> Note:
> >> 
> >> It reasons about memory intrinsics, including memcpy and memset,
> >> and
> >> will pull values out to forward.
> >> 
> >> for memset, if the load is in the memset size, it will compute
> >> what
> >> the value is.
> >> For memcpy, it handles anything that is constant memory.
> >> 
> >> 
> >> (Certainly, loads and stores are handled better, but ...)
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> _______________________________________________
> >> llvm-commits mailing list
> >> llvm-commits at lists.llvm.org
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
> >> 
> >> 
> >> 
> >> _______________________________________________
> >> llvm-commits mailing list
> >> llvm-commits at lists.llvm.org
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
> >> 
> > 
> > --
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory
> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory


More information about the llvm-commits mailing list