[llvm] r249820 - [MemCpyOpt] Fix wrong merging adjacent nontemporal stores into memset calls.

Andrea Di Biagio via llvm-commits llvm-commits at lists.llvm.org
Fri Oct 9 10:58:59 PDT 2015


On Fri, Oct 9, 2015 at 6:09 PM, Quentin Colombet via llvm-commits <
llvm-commits at lists.llvm.org> wrote:

> Hi Philip,
>
> >
> > One of my worries is about how disabling this will effect other
> optimizations around non-temporal stores.  In particular, if we had a
> memset following the stores, we'd no longer have the chance to combine them
> into a single memset.  Replacing two non-temporal stores and a large memset
> with a larger memset seems highly likely to be profitable.
>
> I am not sure about that. I would expect any memset implementation to
> touch the cache and thus, to have an impact on the performance of the
> surrounding code.
>

I agree with Quentin here. I don't think it would be better since you would
end up polluting the caches.


> >
> > Analogously, replacing 8 adjacent non-temporal byte stores with a single
> memset (and then i64 store) also seems profitable and possibly disabled by
> this transform.
> >
> > Also, if we keep this change, the comment needs to be extended to
> highlight this is a profitability decision, not a legality one.
>
> Agree with that part.
>

Adjacent stores like the ones you described should be easily handled by
'MergeConsecutiveStores' in the DAGCombiner. So, I don't think this
transformation is pessimizing the codegen for that case.
MergeConsecutiveStores should be able to merge those small stores and still
preserve the nontemporal hint.

Overall I am not completely against the idea of propagating the
!nontemporal metadata to memset/memcpy (I actually have a patch that
implements that for memset - if people really prefer that approach I can
upload it for review). However, at the moment I don't see the advantage in
doing it (I'd like to see an example where that transformation would be
profitable).

-Andrea

Thanks,

> -Quentin
>
> >
> > Philip
> >
> > On 10/09/2015 03:53 AM, Andrea Di Biagio via llvm-commits wrote:
> >> Author: adibiagio
> >> Date: Fri Oct  9 05:53:41 2015
> >> New Revision: 249820
> >>
> >> URL: http://llvm.org/viewvc/llvm-project?rev=249820&view=rev
> >> Log:
> >> [MemCpyOpt] Fix wrong merging adjacent nontemporal stores into memset
> calls.
> >>
> >> Pass MemCpyOpt doesn't check if a store instruction is nontemporal.
> >> As a consequence, adjacent nontemporal stores are always merged into a
> >> memset call.
> >>
> >> Example:
> >>
> >> ;;;
> >> define void @foo(<4 x float>* nocapture %p) {
> >> entry:
> >>   store <4 x float> zeroinitializer, <4 x float>* %p, align 16,
> !nontemporal !0
> >>   %p1 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 1
> >>   store <4 x float> zeroinitializer, <4 x float>* %p1, align 16,
> !nontemporal !0
> >>   ret void
> >> }
> >>
> >> !0 = !{i32 1}
> >> ;;;
> >>
> >> In this example, the two nontemporal stores are combined to a memset of
> zero
> >> which does not preserve the nontemporal hint. Later on the backend
> (tested on a
> >> x86-64 corei7) expands that memset call into a sequence of two normal
> 16-byte
> >> aligned vector stores.
> >>
> >> opt -memcpyopt example.ll -S -o - | llc -mcpu=corei7 -o -
> >>
> >> Before:
> >>   xorps  %xmm0, %xmm0
> >>   movaps  %xmm0, 16(%rdi)
> >>   movaps  %xmm0, (%rdi)
> >>
> >> With this patch, we no longer merge nontemporal stores into calls to
> memset.
> >> In this example, llc correctly expands the two stores into two movntps:
> >>   xorps  %xmm0, %xmm0
> >>   movntps %xmm0, 16(%rdi)
> >>   movntps  %xmm0, (%rdi)
> >>
> >> In theory, we could extend the usage of !nontemporal metadata to
> memcpy/memset
> >> calls. However a change like that would only have the effect of forcing
> the
> >> backend to expand !nontemporal memsets back to sequences of store
> instructions.
> >> A memset library call would not have exactly the same semantic of a
> builtin
> >> !nontemporal memset call. So, SelectionDAG will have to conservatively
> expand
> >> it back to a sequence of !nontemporal stores (effectively undoing the
> merging).
> >>
> >> Differential Revision: http://reviews.llvm.org/D13519
> >>
> >> Added:
> >>     llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll
> >> Modified:
> >>     llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp
> >>
> >> Modified: llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp
> >> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp?rev=249820&r1=249819&r2=249820&view=diff
> >>
> ==============================================================================
> >> --- llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp (original)
> >> +++ llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp Fri Oct  9
> 05:53:41 2015
> >> @@ -488,6 +488,16 @@ Instruction *MemCpyOpt::tryMergingIntoMe
> >>    bool MemCpyOpt::processStore(StoreInst *SI, BasicBlock::iterator
> &BBI) {
> >>    if (!SI->isSimple()) return false;
> >> +
> >> +  // Avoid merging nontemporal stores since the resulting
> >> +  // memcpy/memset would not be able to preserve the nontemporal hint.
> >> +  // In theory we could teach how to propagate the !nontemporal
> metadata to
> >> +  // memset calls. However, that change would force the backend to
> >> +  // conservatively expand !nontemporal memset calls back to sequences
> of
> >> +  // store instructions (effectively undoing the merging).
> >> +  if (SI->getMetadata(LLVMContext::MD_nontemporal))
> >> +    return false;
> >> +
> >>    const DataLayout &DL = SI->getModule()->getDataLayout();
> >>      // Detect cases where we're performing call slot forwarding, but
> >>
> >> Added: llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll
> >> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll?rev=249820&view=auto
> >>
> ==============================================================================
> >> --- llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll (added)
> >> +++ llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll Fri Oct  9
> 05:53:41 2015
> >> @@ -0,0 +1,49 @@
> >> +; RUN: opt < %s -memcpyopt -S | FileCheck %s
> >> +
> >> +target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
> >> +
> >> +; Verify that we don't combine nontemporal stores into memset calls.
> >> +
> >> +define void @nontemporal_stores_1(<4 x float>* nocapture %dst) {
> >> +; CHECK-LABEL: @nontemporal_stores_1
> >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %dst, align
> 16, !nontemporal !0
> >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr1, align
> 16, !nontemporal !0
> >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr2, align
> 16, !nontemporal !0
> >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr3, align
> 16, !nontemporal !0
> >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr4, align
> 16, !nontemporal !0
> >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr5, align
> 16, !nontemporal !0
> >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr6, align
> 16, !nontemporal !0
> >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr7, align
> 16, !nontemporal !0
> >> +; CHECK-NEXT: ret void
> >> +entry:
> >> +  store <4 x float> zeroinitializer, <4 x float>* %dst, align 16,
> !nontemporal !0
> >> +  %ptr1 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 1
> >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr1, align 16,
> !nontemporal !0
> >> +  %ptr2 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 2
> >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr2, align 16,
> !nontemporal !0
> >> +  %ptr3 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 3
> >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr3, align 16,
> !nontemporal !0
> >> +  %ptr4 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 4
> >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr4, align 16,
> !nontemporal !0
> >> +  %ptr5 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 5
> >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr5, align 16,
> !nontemporal !0
> >> +  %ptr6 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 6
> >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr6, align 16,
> !nontemporal !0
> >> +  %ptr7 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 7
> >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr7, align 16,
> !nontemporal !0
> >> +  ret void
> >> +}
> >> +
> >> +define void @nontemporal_stores_2(<4 x float>* nocapture %dst) {
> >> +; CHECK-LABEL: @nontemporal_stores_2
> >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %dst, align
> 16, !nontemporal !0
> >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr1, align
> 16, !nontemporal !0
> >> +; CHECK-NEXT: ret void
> >> +entry:
> >> +  store <4 x float> zeroinitializer, <4 x float>* %dst, align 16,
> !nontemporal !0
> >> +  %ptr1 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 1
> >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr1, align 16,
> !nontemporal !0
> >> +  ret void
> >> +}
> >> +
> >> +!0 = !{i32 1}
> >>
> >>
> >> _______________________________________________
> >> llvm-commits mailing list
> >> llvm-commits at lists.llvm.org
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20151009/b6781af1/attachment.html>


More information about the llvm-commits mailing list