[PATCH] Gut InstCombiner::SimplifyMemTransfer.

John McCall via llvm-commits llvm-commits at lists.llvm.org
Tue Aug 18 16:57:35 PDT 2015


> On Aug 16, 2015, at 3:08 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> 
> [+Richard, John, Chandler]
> 
> ----- Original Message -----
>> From: "David Majnemer via llvm-commits" <llvm-commits at lists.llvm.org>
>> To: "Lang Hames" <lhames at gmail.com>
>> Cc: "Commit Messages and Patches for LLVM" <llvm-commits at lists.llvm.org>
>> Sent: Wednesday, August 12, 2015 10:35:16 PM
>> Subject: Re: [PATCH] Gut InstCombiner::SimplifyMemTransfer.
>> 
>> 
>> 
>> Hi Lang,
>> 
>> 
>> I reviewed the InstCombine side and it looks great, all that I ask is
>> that you replace the `Instruction *` with `auto *` because the type
>> is replicated in the `dyn_cast`.
>> 
>> 
>> My only concern is that clang will actually create calls to the
>> memcpy intrinsic from within clang with pointers which exactly
>> alias:
>> http://clang.llvm.org/doxygen/CGExprAgg_8cpp_source.html#l01448
>> 
> 
> That seems very broken. We need to fix this. In fact, the comment admits as much:
> 
> 01454   // memcpy is not defined if the source and destination pointers are exactly
> 01455   // equal, but other compilers do this optimization, and almost every memcpy
> 01456   // implementation handles this case safely.  If there is a libc that does not
> 01457   // safely handle this, we can add a target hook.
> 
> Is there any reason not to emit memmove instead?

Partial overlaps are generally not possible here — C guarantees that the overlap
in an assignment has to be perfect.  So a great deal of the added complexity of
memmove is not needed; and as mentioned, no actual implementations of
memcpy misbehave for perfect overlap.

That said, I can appreciate wanting to keep the formal model sane.  I suspect that
emitting llvm.memmove would generally produce uses of the actual memmove,
but I'd be happy to use a different intrinsic that allows only perfect overlap and which
you can make a “platform-specific” decision to lower to memcpy in the backend.

John.

> 
> -Hal
> 
>> 
>> I cannot confess to being an alias-analysis expert but my concern is
>> that annotating the memcpy as such could create some interesting
>> paradoxes. Those more familiar with AA should chime in.
>> 
>> 
>> I'm afraid I cannot provide any guidance on the SelectionDAG changes.
>> 
>> 
>> Thanks
>> --
>> David Majnemer
>> 
>> 
>> On Wed, Aug 12, 2015 at 10:06 PM, Lang Hames via llvm-commits <
>> llvm-commits at lists.llvm.org > wrote:
>> 
>> 
>> 
>> Another belated *ping*.
>> 
>> 
>> 
>> - Lang.
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Tue, Jun 30, 2015 at 2:16 PM, Lang Hames < lhames at gmail.com >
>> wrote:
>> 
>> 
>> 
>> Belated ping. :)
>> 
>> 
>> - Lang.
>> 
>> 
>> 
>> 
>> On Tue, May 12, 2015 at 2:11 PM, Lang Hames < lhames at gmail.com >
>> wrote:
>> 
>> 
>> 
>> Gentle ping.
>> 
>> 
>> - Lang.
>> 
>> 
>> 
>> 
>> On Tue, Apr 28, 2015 at 11:51 AM, Lang Hames < lhames at gmail.com >
>> wrote:
>> 
>> 
>> 
>> Ping / delayed reply.
>> 
>> 
>> DannyB - thanks for the extra context. :)
>> 
>> 
>> Chandler - any thoughts? Assuming you still want to keep this
>> transform, what do you think of my instcombine change? (i.e. tagging
>> the load/store pointers as non-aliasing for memcpy).
>> 
>> 
>> Cheers,
>> Lang.
>> 
>> 
>> 
>> 
>> 
>> 
>> On Tue, Apr 21, 2015 at 9:21 PM, Daniel Berlin < dberlin at dberlin.org
>>> wrote:
>> 
>> 
>> On Wed, Apr 8, 2015 at 10:38 PM, Chandler Carruth <
>> chandlerc at gmail.com > wrote:
>>> On Wed, Apr 8, 2015 at 10:15 PM Lang Hames < lhames at gmail.com >
>>> wrote:
>>>> 
>>>> Hi David, Chandler,
>>>> 
>>>> The attached patch guts the SimplifyMemTransfer method, which
>>>> turns small
>>>> memcpys (1/2/4/8 bytes) into load/store pairs. Turning memcpys
>>>> into
>>>> load/store pairs loses information, since we can no longer assume
>>>> the source
>>>> and dest are non-overlapping. This is leading to some suboptimal
>>>> expansions
>>>> for small memcpys on AArch64 when -mno-unaligned-access is turned
>>>> on (see
>>>> r234462). I suspect other architectures would suffer similar
>>>> issues.
>>>> 
>>>> I assume this transform is an old workaround to simplify other
>>>> non-memcpy-aware IR transforms. These days I think most IR
>>>> transforms can
>>>> reason sensibly about memcpys, so I'm hoping this is safe to
>>>> remove. FWIW,
>>>> removing it didn't hit any regression tests except those that were
>>>> verifying
>>>> that this optimisation was being applied, but then you wouldn't
>>>> really
>>>> expect it to hit any others.
>>> 
>>> 
>>> Heh. I tried to remove it before and it regressed a *lot* of
>>> performance.
>>> Have you measured it? I think there are many places that don't
>>> today reason
>>> about memcpy but do reason about loads and stores. Here is a
>>> partial list:
>>> 
>>> - GVN
>> 
>> Note:
>> 
>> It reasons about memory intrinsics, including memcpy and memset, and
>> will pull values out to forward.
>> 
>> for memset, if the load is in the memset size, it will compute what
>> the value is.
>> For memcpy, it handles anything that is constant memory.
>> 
>> 
>> (Certainly, loads and stores are handled better, but ...)
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>> 
>> 
>> 
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>> 
> 
> -- 
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory



More information about the llvm-commits mailing list