[llvm-dev] [RFC] Generalize llvm.memcpy / llvm.memmove intrinsics.
Hal Finkel via llvm-dev
llvm-dev at lists.llvm.org
Wed Aug 19 11:47:02 PDT 2015
----- Original Message -----
> From: "Lang Hames" <lhames at gmail.com>
> To: "LLVM Developers Mailing List" <llvm-dev at lists.llvm.org>
> Cc: "Chandler Carruth" <chandlerc at gmail.com>, "Hal Finkel" <hfinkel at anl.gov>, "David Majnemer"
> <david.majnemer at gmail.com>, "John McCall" <rjmccall at apple.com>, "Jim Grosbach" <grosbach at apple.com>
> Sent: Tuesday, August 18, 2015 8:04:48 PM
> Subject: [RFC] Generalize llvm.memcpy / llvm.memmove intrinsics.
> Hi All,
> I'd like to float two changes to the llvm.memcpy / llvm.memmove
> (1) Add an i1 <mayPerfectlyAlias> argument to the llvm.memcpy
> When set to '1' (the auto-upgrade default), this argument would
> indicate that the source and destination arguments may perfectly
> alias (otherwise they must not alias at all - memcpy prohibits
> partial overlap). While the C standard says that memcpy's arguments
> can't alias at all, perfect aliasing works in practice, and clang
> currently relies on this behavior: it emits llvm.memcpys for
> aggregate copies, despite the possibility of self-assignment.
> Going forward, llvm.memcpy calls emitted for aggregate copies would
> have mayPerfectlyAlias set to '1'. Other uses of llvm.memcpy
> (including lowerings from memcpy calls) would have mapPerfectlyAlias
> set to '0'.
> This change is motivated by poor optimization for small memcpys on
> targets with strict alignment requirements. When a user writes a
> small, unaligned memcpy we may transform it into an unaligned
> load/store pair in instcombine (See
> InstCombine::SimplifyMemTransfer), which is then broken up into an
> unwieldy series of smaller loads and stores during legalization. I
> have a fix for this issue which tags the pointers for unaligned
> load/store pairs with noalias metadata allowing CodeGen to produce
> better code during legalization, but it's not safe to apply while
> clang is emitting memcpys with pointers that may perfectly alias. If
> the 'mayPerfectlyAlias' flag were introduced, I could inspect that
> and add the noalias tag only if mayPerfectlyAlias is '0'.
> Note: We could also achieve the desired effect by adding a new
> intrinsic (llvm.structcpy?) with semantics that match the current
> llvm.memcpy ones (i.e. perfect-aliasing or non-aliasing, but no
> partial), and then reclaim llvm.memcpy for non-aliasing pointers
> only. I floated this idea with David Majnemer on IRC and he
> suggested that adding a flag to llvm.memcpy might be less disruptive
> and easier to maintain - thanks for the suggestion David!
> (2) Allow different source and destination alignments on both
> llvm.memcpy / llvm.memmove.
> Since I'm talking about changes to llvm.memcpy anyway, a few people
> asked me to float this one. Having separate alignments for the
> source and destination pointers may allow us to generate better code
> when one of the pointers has a higher alignment.
> The auto-upgrade for this would be to set both source and destination
> alignment to the original 'align' value.
As one of the people who asked for this, let me add: We currently have code which upgrades the alignment on memcpy intrinsics (because of alignment attributes, assumptions, etc.), and this is useful for making memcpy expand into vector instructions when the source/destination are suitably aligned. It would be useful for this to happen on some targets even if only the source or destination could be upgraded (aligned stores but underaligned loads might still be a win, for example). Currently we can't do this because we can only represent a single alignment. Because we aggressively form memcpy as part of idiom recognition, and emit them in frontends, this comes up more than it would from source-level memcpy calls alone.
Thus, I agree with John (and Lang), so long as we're fooling with the memcpy intrinsic's signature, we should do this too.
> Any thoughts?
I'm strongly in favor of both pieces.
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
More information about the llvm-dev