[PATCH] Gut InstCombiner::SimplifyMemTransfer.

Tue Aug 18 17:18:12 PDT 2015

----- Original Message -----
> From: "Lang Hames" <lhames at gmail.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "John McCall" <rjmccall at apple.com>, "Commit Messages and Patches for LLVM" <llvm-commits at lists.llvm.org>,
> "Richard Smith" <richard at metafoo.co.uk>, "Chandler Carruth" <chandlerc at gmail.com>, "David Majnemer"
> <david.majnemer at gmail.com>
> Sent: Tuesday, August 18, 2015 7:12:39 PM
> Subject: Re: [PATCH] Gut InstCombiner::SimplifyMemTransfer.
> 
> 
> Thanks guys,
> 
> 
> I'm writing up a short RFC now - I'll add a discussion of the
> alignment issue. While it's a separate issue, it makes sense to
> tackle both at once to avoid churn.
> 
> Are there any specific motivating cases you'd like me to mention?
> 

We currently have code which "upgrades" the alignment on memcpy intrinsics (because of alignment attributes, assumptions, etc.), and this is useful for making memvpy expand into vector instructions when the source/destination are suitably aligned. It would be useful for this to happen on some targets even if only the source or destination could be upgraded (aligned stores but underaligned loads might still be a win, for example). Currently we can't do this because we can only represent a single alignment.

 -Hal

> 
> - Lang.
> 
> 
> On Tue, Aug 18, 2015 at 5:06 PM, Hal Finkel < hfinkel at anl.gov >
> wrote:
> 
> 
> ----- Original Message -----
> > From: "John McCall" < rjmccall at apple.com >
> > To: "David Majnemer" < david.majnemer at gmail.com >
> > Cc: "Hal Finkel" < hfinkel at anl.gov >, "Commit Messages and Patches
> > for LLVM" < llvm-commits at lists.llvm.org >, "Lang
> > Hames" < lhames at gmail.com >, "Richard Smith" <
> > richard at metafoo.co.uk >, "Chandler Carruth" < chandlerc at gmail.com
> > >
> 
> 
> > Sent: Tuesday, August 18, 2015 7:05:18 PM
> > Subject: Re: [PATCH] Gut InstCombiner::SimplifyMemTransfer.
> > 
> > 
> > 
> > 
> > 
> > On Aug 18, 2015, at 5:01 PM, David Majnemer <
> > david.majnemer at gmail.com > wrote:
> > 
> > 
> > 
> > On Tue, Aug 18, 2015 at 4:57 PM, John McCall < rjmccall at apple.com >
> > wrote:
> > 
> > 
> > 
> > > On Aug 16, 2015, at 3:08 AM, Hal Finkel < hfinkel at anl.gov >
> > > wrote:
> > > 
> > > [+Richard, John, Chandler]
> > > 
> > > ----- Original Message -----
> > >> From: "David Majnemer via llvm-commits" <
> > >> llvm-commits at lists.llvm.org >
> > >> To: "Lang Hames" < lhames at gmail.com >
> > >> Cc: "Commit Messages and Patches for LLVM" <
> > >> llvm-commits at lists.llvm.org >
> > >> Sent: Wednesday, August 12, 2015 10:35:16 PM
> > >> Subject: Re: [PATCH] Gut InstCombiner::SimplifyMemTransfer.
> > >> 
> > >> 
> > >> 
> > >> Hi Lang,
> > >> 
> > >> 
> > >> I reviewed the InstCombine side and it looks great, all that I
> > >> ask
> > >> is
> > >> that you replace the `Instruction *` with `auto *` because the
> > >> type
> > >> is replicated in the `dyn_cast`.
> > >> 
> > >> 
> > >> My only concern is that clang will actually create calls to the
> > >> memcpy intrinsic from within clang with pointers which exactly
> > >> alias:
> > >> http://clang.llvm.org/doxygen/CGExprAgg_8cpp_source.html#l01448
> > >> 
> > > 
> > > That seems very broken. We need to fix this. In fact, the comment
> > > admits as much:
> > > 
> > > 01454 // memcpy is not defined if the source and destination
> > > pointers are exactly
> > > 01455 // equal, but other compilers do this optimization, and
> > > almost every memcpy
> > > 01456 // implementation handles this case safely. If there is a
> > > libc that does not
> > > 01457 // safely handle this, we can add a target hook.
> > > 
> > > Is there any reason not to emit memmove instead?
> > 
> > Partial overlaps are generally not possible here — C guarantees
> > that
> > the overlap
> > in an assignment has to be perfect. So a great deal of the added
> > complexity of
> > memmove is not needed; and as mentioned, no actual implementations
> > of
> > memcpy misbehave for perfect overlap.
> > 
> > That said, I can appreciate wanting to keep the formal model sane.
> > I
> > suspect that
> > emitting llvm.memmove would generally produce uses of the actual
> > memmove,
> > but I'd be happy to use a different intrinsic that allows only
> > perfect overlap and which
> > you can make a “platform-specific” decision to lower to memcpy in
> > the
> > backend.
> > 
> > 
> > 
> > I suggested to Lang on IRC that we might want to add another
> > operand
> > to the memcpy intrinsic to model this behavior. Another intrinsic
> > would, of course, work just as well. AFAICT, the only code which
> > would care about the distinction between memcpy
> > as-written-in-source
> > and as-written-by-compiler which would make the extra flag a little
> > less disruptive than another intrinsic.
> > 
> > Oh, by the way, if you’re going to revise or embellish these
> > intrinsics, please also allow separate alignments to be expressed
> > for both operands.
> 
> While this is certainly a separate issue, FWIW, I share this desire.
> 
> -Hal
> 
> > 
> > 
> > John.
> 
> 
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory