[llvm] r249820 - [MemCpyOpt] Fix wrong merging adjacent nontemporal stores into memset calls.

Philip Reames via llvm-commits llvm-commits at lists.llvm.org
Fri Oct 9 09:51:34 PDT 2015


I'm not sure this patch is moving in the right direction.  In 
particular, I find the fact that adding the non-temporal metadata is 
restricting optimization to be very problematic.  As far as I can tell, 
the transform being disabled was entirely legal; it just wasn't 
profitable in this example. Extending non-temporal to memset calls seems 
like a much better scenario.

One of my worries is about how disabling this will effect other 
optimizations around non-temporal stores.  In particular, if we had a 
memset following the stores, we'd no longer have the chance to combine 
them into a single memset.  Replacing two non-temporal stores and a 
large memset with a larger memset seems highly likely to be profitable.

Analogously, replacing 8 adjacent non-temporal byte stores with a single 
memset (and then i64 store) also seems profitable and possibly disabled 
by this transform.

Also, if we keep this change, the comment needs to be extended to 
highlight this is a profitability decision, not a legality one.

Philip

On 10/09/2015 03:53 AM, Andrea Di Biagio via llvm-commits wrote:
> Author: adibiagio
> Date: Fri Oct  9 05:53:41 2015
> New Revision: 249820
>
> URL: http://llvm.org/viewvc/llvm-project?rev=249820&view=rev
> Log:
> [MemCpyOpt] Fix wrong merging adjacent nontemporal stores into memset calls.
>
> Pass MemCpyOpt doesn't check if a store instruction is nontemporal.
> As a consequence, adjacent nontemporal stores are always merged into a
> memset call.
>
> Example:
>
> ;;;
> define void @foo(<4 x float>* nocapture %p) {
> entry:
>    store <4 x float> zeroinitializer, <4 x float>* %p, align 16, !nontemporal !0
>    %p1 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 1
>    store <4 x float> zeroinitializer, <4 x float>* %p1, align 16, !nontemporal !0
>    ret void
> }
>
> !0 = !{i32 1}
> ;;;
>
> In this example, the two nontemporal stores are combined to a memset of zero
> which does not preserve the nontemporal hint. Later on the backend (tested on a
> x86-64 corei7) expands that memset call into a sequence of two normal 16-byte
> aligned vector stores.
>
> opt -memcpyopt example.ll -S -o - | llc -mcpu=corei7 -o -
>
> Before:
>    xorps  %xmm0, %xmm0
>    movaps  %xmm0, 16(%rdi)
>    movaps  %xmm0, (%rdi)
>
> With this patch, we no longer merge nontemporal stores into calls to memset.
> In this example, llc correctly expands the two stores into two movntps:
>    xorps  %xmm0, %xmm0
>    movntps %xmm0, 16(%rdi)
>    movntps  %xmm0, (%rdi)
>
> In theory, we could extend the usage of !nontemporal metadata to memcpy/memset
> calls. However a change like that would only have the effect of forcing the
> backend to expand !nontemporal memsets back to sequences of store instructions.
> A memset library call would not have exactly the same semantic of a builtin
> !nontemporal memset call. So, SelectionDAG will have to conservatively expand
> it back to a sequence of !nontemporal stores (effectively undoing the merging).
>
> Differential Revision: http://reviews.llvm.org/D13519
>
> Added:
>      llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll
> Modified:
>      llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp
>
> Modified: llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp?rev=249820&r1=249819&r2=249820&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp (original)
> +++ llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp Fri Oct  9 05:53:41 2015
> @@ -488,6 +488,16 @@ Instruction *MemCpyOpt::tryMergingIntoMe
>   
>   bool MemCpyOpt::processStore(StoreInst *SI, BasicBlock::iterator &BBI) {
>     if (!SI->isSimple()) return false;
> +
> +  // Avoid merging nontemporal stores since the resulting
> +  // memcpy/memset would not be able to preserve the nontemporal hint.
> +  // In theory we could teach how to propagate the !nontemporal metadata to
> +  // memset calls. However, that change would force the backend to
> +  // conservatively expand !nontemporal memset calls back to sequences of
> +  // store instructions (effectively undoing the merging).
> +  if (SI->getMetadata(LLVMContext::MD_nontemporal))
> +    return false;
> +
>     const DataLayout &DL = SI->getModule()->getDataLayout();
>   
>     // Detect cases where we're performing call slot forwarding, but
>
> Added: llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll?rev=249820&view=auto
> ==============================================================================
> --- llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll (added)
> +++ llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll Fri Oct  9 05:53:41 2015
> @@ -0,0 +1,49 @@
> +; RUN: opt < %s -memcpyopt -S | FileCheck %s
> +
> +target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
> +
> +; Verify that we don't combine nontemporal stores into memset calls.
> +
> +define void @nontemporal_stores_1(<4 x float>* nocapture %dst) {
> +; CHECK-LABEL: @nontemporal_stores_1
> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %dst, align 16, !nontemporal !0
> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr1, align 16, !nontemporal !0
> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr2, align 16, !nontemporal !0
> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr3, align 16, !nontemporal !0
> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr4, align 16, !nontemporal !0
> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr5, align 16, !nontemporal !0
> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr6, align 16, !nontemporal !0
> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr7, align 16, !nontemporal !0
> +; CHECK-NEXT: ret void
> +entry:
> +  store <4 x float> zeroinitializer, <4 x float>* %dst, align 16, !nontemporal !0
> +  %ptr1 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 1
> +  store <4 x float> zeroinitializer, <4 x float>* %ptr1, align 16, !nontemporal !0
> +  %ptr2 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 2
> +  store <4 x float> zeroinitializer, <4 x float>* %ptr2, align 16, !nontemporal !0
> +  %ptr3 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 3
> +  store <4 x float> zeroinitializer, <4 x float>* %ptr3, align 16, !nontemporal !0
> +  %ptr4 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 4
> +  store <4 x float> zeroinitializer, <4 x float>* %ptr4, align 16, !nontemporal !0
> +  %ptr5 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 5
> +  store <4 x float> zeroinitializer, <4 x float>* %ptr5, align 16, !nontemporal !0
> +  %ptr6 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 6
> +  store <4 x float> zeroinitializer, <4 x float>* %ptr6, align 16, !nontemporal !0
> +  %ptr7 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 7
> +  store <4 x float> zeroinitializer, <4 x float>* %ptr7, align 16, !nontemporal !0
> +  ret void
> +}
> +
> +define void @nontemporal_stores_2(<4 x float>* nocapture %dst) {
> +; CHECK-LABEL: @nontemporal_stores_2
> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %dst, align 16, !nontemporal !0
> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr1, align 16, !nontemporal !0
> +; CHECK-NEXT: ret void
> +entry:
> +  store <4 x float> zeroinitializer, <4 x float>* %dst, align 16, !nontemporal !0
> +  %ptr1 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 1
> +  store <4 x float> zeroinitializer, <4 x float>* %ptr1, align 16, !nontemporal !0
> +  ret void
> +}
> +
> +!0 = !{i32 1}
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits



More information about the llvm-commits mailing list