[llvm] r249820 - [MemCpyOpt] Fix wrong merging adjacent nontemporal stores into memset calls.

Quentin Colombet via llvm-commits llvm-commits at lists.llvm.org
Fri Oct 9 11:15:44 PDT 2015


Hi Andrea,

> On Oct 9, 2015, at 10:58 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:
> 
> 
> 
> On Fri, Oct 9, 2015 at 6:09 PM, Quentin Colombet via llvm-commits <llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>> wrote:
> Hi Philip,
> 
> >
> > One of my worries is about how disabling this will effect other optimizations around non-temporal stores.  In particular, if we had a memset following the stores, we'd no longer have the chance to combine them into a single memset.  Replacing two non-temporal stores and a large memset with a larger memset seems highly likely to be profitable.
> 
> I am not sure about that. I would expect any memset implementation to touch the cache and thus, to have an impact on the performance of the surrounding code.
> 
> I agree with Quentin here. I don't think it would be better since you would end up polluting the caches.
> 
> 
> >
> > Analogously, replacing 8 adjacent non-temporal byte stores with a single memset (and then i64 store) also seems profitable and possibly disabled by this transform.
> >
> > Also, if we keep this change, the comment needs to be extended to highlight this is a profitability decision, not a legality one.
> 
> Agree with that part.
> 
> Adjacent stores like the ones you described should be easily handled by 'MergeConsecutiveStores' in the DAGCombiner. So, I don't think this transformation is pessimizing the codegen for that case. MergeConsecutiveStores should be able to merge those small stores and still preserve the nontemporal hint.
> 
> Overall I am not completely against the idea of propagating the !nontemporal metadata to memset/memcpy (I actually have a patch that implements that for memset - if people really prefer that approach I can upload it for review). However, at the moment I don't see the advantage in doing it (I'd like to see an example where that transformation would be profitable).

The thing I was agreeing with was that the comment should say that we block the transformation not because we couldn’t fix memset, but because it is not worth it given we would need to split the memset later. That was how I read the profitability part, though I guess Philip was more into what you understood :).

Cheers,
Q.

> 
> -Andrea
> 
> Thanks,
> -Quentin
> 
> >
> > Philip
> >
> > On 10/09/2015 03:53 AM, Andrea Di Biagio via llvm-commits wrote:
> >> Author: adibiagio
> >> Date: Fri Oct  9 05:53:41 2015
> >> New Revision: 249820
> >>
> >> URL: http://llvm.org/viewvc/llvm-project?rev=249820&view=rev <http://llvm.org/viewvc/llvm-project?rev=249820&view=rev>
> >> Log:
> >> [MemCpyOpt] Fix wrong merging adjacent nontemporal stores into memset calls.
> >>
> >> Pass MemCpyOpt doesn't check if a store instruction is nontemporal.
> >> As a consequence, adjacent nontemporal stores are always merged into a
> >> memset call.
> >>
> >> Example:
> >>
> >> ;;;
> >> define void @foo(<4 x float>* nocapture %p) {
> >> entry:
> >>   store <4 x float> zeroinitializer, <4 x float>* %p, align 16, !nontemporal !0
> >>   %p1 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 1
> >>   store <4 x float> zeroinitializer, <4 x float>* %p1, align 16, !nontemporal !0
> >>   ret void
> >> }
> >>
> >> !0 = !{i32 1}
> >> ;;;
> >>
> >> In this example, the two nontemporal stores are combined to a memset of zero
> >> which does not preserve the nontemporal hint. Later on the backend (tested on a
> >> x86-64 corei7) expands that memset call into a sequence of two normal 16-byte
> >> aligned vector stores.
> >>
> >> opt -memcpyopt example.ll -S -o - | llc -mcpu=corei7 -o -
> >>
> >> Before:
> >>   xorps  %xmm0, %xmm0
> >>   movaps  %xmm0, 16(%rdi)
> >>   movaps  %xmm0, (%rdi)
> >>
> >> With this patch, we no longer merge nontemporal stores into calls to memset.
> >> In this example, llc correctly expands the two stores into two movntps:
> >>   xorps  %xmm0, %xmm0
> >>   movntps %xmm0, 16(%rdi)
> >>   movntps  %xmm0, (%rdi)
> >>
> >> In theory, we could extend the usage of !nontemporal metadata to memcpy/memset
> >> calls. However a change like that would only have the effect of forcing the
> >> backend to expand !nontemporal memsets back to sequences of store instructions.
> >> A memset library call would not have exactly the same semantic of a builtin
> >> !nontemporal memset call. So, SelectionDAG will have to conservatively expand
> >> it back to a sequence of !nontemporal stores (effectively undoing the merging).
> >>
> >> Differential Revision: http://reviews.llvm.org/D13519 <http://reviews.llvm.org/D13519>
> >>
> >> Added:
> >>     llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll
> >> Modified:
> >>     llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp
> >>
> >> Modified: llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp
> >> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp?rev=249820&r1=249819&r2=249820&view=diff <http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp?rev=249820&r1=249819&r2=249820&view=diff>
> >> ==============================================================================
> >> --- llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp (original)
> >> +++ llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp Fri Oct  9 05:53:41 2015
> >> @@ -488,6 +488,16 @@ Instruction *MemCpyOpt::tryMergingIntoMe
> >>    bool MemCpyOpt::processStore(StoreInst *SI, BasicBlock::iterator &BBI) {
> >>    if (!SI->isSimple()) return false;
> >> +
> >> +  // Avoid merging nontemporal stores since the resulting
> >> +  // memcpy/memset would not be able to preserve the nontemporal hint.
> >> +  // In theory we could teach how to propagate the !nontemporal metadata to
> >> +  // memset calls. However, that change would force the backend to
> >> +  // conservatively expand !nontemporal memset calls back to sequences of
> >> +  // store instructions (effectively undoing the merging).
> >> +  if (SI->getMetadata(LLVMContext::MD_nontemporal))
> >> +    return false;
> >> +
> >>    const DataLayout &DL = SI->getModule()->getDataLayout();
> >>      // Detect cases where we're performing call slot forwarding, but
> >>
> >> Added: llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll
> >> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll?rev=249820&view=auto <http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll?rev=249820&view=auto>
> >> ==============================================================================
> >> --- llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll (added)
> >> +++ llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll Fri Oct  9 05:53:41 2015
> >> @@ -0,0 +1,49 @@
> >> +; RUN: opt < %s -memcpyopt -S | FileCheck %s
> >> +
> >> +target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
> >> +
> >> +; Verify that we don't combine nontemporal stores into memset calls.
> >> +
> >> +define void @nontemporal_stores_1(<4 x float>* nocapture %dst) {
> >> +; CHECK-LABEL: @nontemporal_stores_1
> >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %dst, align 16, !nontemporal !0
> >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr1, align 16, !nontemporal !0
> >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr2, align 16, !nontemporal !0
> >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr3, align 16, !nontemporal !0
> >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr4, align 16, !nontemporal !0
> >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr5, align 16, !nontemporal !0
> >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr6, align 16, !nontemporal !0
> >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr7, align 16, !nontemporal !0
> >> +; CHECK-NEXT: ret void
> >> +entry:
> >> +  store <4 x float> zeroinitializer, <4 x float>* %dst, align 16, !nontemporal !0
> >> +  %ptr1 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 1
> >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr1, align 16, !nontemporal !0
> >> +  %ptr2 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 2
> >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr2, align 16, !nontemporal !0
> >> +  %ptr3 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 3
> >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr3, align 16, !nontemporal !0
> >> +  %ptr4 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 4
> >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr4, align 16, !nontemporal !0
> >> +  %ptr5 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 5
> >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr5, align 16, !nontemporal !0
> >> +  %ptr6 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 6
> >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr6, align 16, !nontemporal !0
> >> +  %ptr7 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 7
> >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr7, align 16, !nontemporal !0
> >> +  ret void
> >> +}
> >> +
> >> +define void @nontemporal_stores_2(<4 x float>* nocapture %dst) {
> >> +; CHECK-LABEL: @nontemporal_stores_2
> >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %dst, align 16, !nontemporal !0
> >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr1, align 16, !nontemporal !0
> >> +; CHECK-NEXT: ret void
> >> +entry:
> >> +  store <4 x float> zeroinitializer, <4 x float>* %dst, align 16, !nontemporal !0
> >> +  %ptr1 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 1
> >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr1, align 16, !nontemporal !0
> >> +  ret void
> >> +}
> >> +
> >> +!0 = !{i32 1}
> >>
> >>
> >> _______________________________________________
> >> llvm-commits mailing list
> >> llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20151009/eeab6d61/attachment.html>


More information about the llvm-commits mailing list