[llvm] r249820 - [MemCpyOpt] Fix wrong merging adjacent nontemporal stores into memset calls.

Philip Reames via llvm-commits llvm-commits at lists.llvm.org
Fri Oct 9 12:51:34 PDT 2015


I don't have a strong enough opinion here to push the issue, but I think 
I still disagree with the chosen approach.  However, I will defer to you 
two since you seem to have discussed the tradeoffs and are better 
positioned to make the decision.

p.s. If the patch to support non-temporal memsets was simple, it might 
be interesting in it's own right.  Even if we don't canonicalize to it, 
having the ability to chose it in the frontend could be interesting.  
Note the "could"; I have no concrete use for this in the near future.

Philip



On 10/09/2015 11:15 AM, Quentin Colombet wrote:
> Hi Andrea,
>
>> On Oct 9, 2015, at 10:58 AM, Andrea Di Biagio 
>> <andrea.dibiagio at gmail.com <mailto:andrea.dibiagio at gmail.com>> wrote:
>>
>>
>>
>> On Fri, Oct 9, 2015 at 6:09 PM, Quentin Colombet via 
>> llvm-commits<llvm-commits at lists.llvm.org 
>> <mailto:llvm-commits at lists.llvm.org>>wrote:
>>
>>     Hi Philip,
>>
>>     >
>>     > One of my worries is about how disabling this will effect other
>>     optimizations around non-temporal stores.  In particular, if we
>>     had a memset following the stores, we'd no longer have the chance
>>     to combine them into a single memset.  Replacing two non-temporal
>>     stores and a large memset with a larger memset seems highly
>>     likely to be profitable.
>>
>>     I am not sure about that. I would expect any memset
>>     implementation to touch the cache and thus, to have an impact on
>>     the performance of the surrounding code.
>>
>>
>> I agree with Quentin here. I don't think it would be better since you 
>> would end up polluting the caches.
>>
>>
>>     >
>>     > Analogously, replacing 8 adjacent non-temporal byte stores with
>>     a single memset (and then i64 store) also seems profitable and
>>     possibly disabled by this transform.
>>     >
>>     > Also, if we keep this change, the comment needs to be extended
>>     to highlight this is a profitability decision, not a legality one.
>>
>>     Agree with that part.
>>
>>
>> Adjacent stores like the ones you described should be easily handled 
>> by 'MergeConsecutiveStores' in the DAGCombiner. So, I don't think 
>> this transformation is pessimizing the codegen for that case. 
>> MergeConsecutiveStores should be able to merge those small stores and 
>> still preserve the nontemporal hint.
>>
>> Overall I am not completely against the idea of propagating the 
>> !nontemporal metadata to memset/memcpy (I actually have a patch that 
>> implements that for memset - if people really prefer that approach I 
>> can upload it for review). However, at the moment I don't see the 
>> advantage in doing it (I'd like to see an example where that 
>> transformation would be profitable).
>
> The thing I was agreeing with was that the comment should say that we 
> block the transformation not because we couldn’t fix memset, but 
> because it is not worth it given we would need to split the memset 
> later. That was how I read the profitability part, though I guess 
> Philip was more into what you understood :).
>
> Cheers,
> Q.
>
>>
>> -Andrea
>>
>> Thanks,
>>
>>     -Quentin
>>
>>     >
>>     > Philip
>>     >
>>     > On 10/09/2015 03:53 AM, Andrea Di Biagio via llvm-commits wrote:
>>     >> Author: adibiagio
>>     >> Date: Fri Oct  9 05:53:41 2015
>>     >> New Revision: 249820
>>     >>
>>     >> URL:http://llvm.org/viewvc/llvm-project?rev=249820&view=rev
>>     >> Log:
>>     >> [MemCpyOpt] Fix wrong merging adjacent nontemporal stores into
>>     memset calls.
>>     >>
>>     >> Pass MemCpyOpt doesn't check if a store instruction is
>>     nontemporal.
>>     >> As a consequence, adjacent nontemporal stores are always
>>     merged into a
>>     >> memset call.
>>     >>
>>     >> Example:
>>     >>
>>     >> ;;;
>>     >> define void @foo(<4 x float>* nocapture %p) {
>>     >> entry:
>>     >>   store <4 x float> zeroinitializer, <4 x float>* %p, align
>>     16, !nontemporal !0
>>     >>   %p1 = getelementptr inbounds <4 x float>, <4 x float>* %dst,
>>     i64 1
>>     >>   store <4 x float> zeroinitializer, <4 x float>* %p1, align
>>     16, !nontemporal !0
>>     >>   ret void
>>     >> }
>>     >>
>>     >> !0 = !{i32 1}
>>     >> ;;;
>>     >>
>>     >> In this example, the two nontemporal stores are combined to a
>>     memset of zero
>>     >> which does not preserve the nontemporal hint. Later on the
>>     backend (tested on a
>>     >> x86-64 corei7) expands that memset call into a sequence of two
>>     normal 16-byte
>>     >> aligned vector stores.
>>     >>
>>     >> opt -memcpyopt example.ll -S -o - | llc -mcpu=corei7 -o -
>>     >>
>>     >> Before:
>>     >>   xorps  %xmm0, %xmm0
>>     >>   movaps  %xmm0, 16(%rdi)
>>     >>   movaps  %xmm0, (%rdi)
>>     >>
>>     >> With this patch, we no longer merge nontemporal stores into
>>     calls to memset.
>>     >> In this example, llc correctly expands the two stores into two
>>     movntps:
>>     >>   xorps  %xmm0, %xmm0
>>     >>   movntps %xmm0, 16(%rdi)
>>     >>   movntps  %xmm0, (%rdi)
>>     >>
>>     >> In theory, we could extend the usage of !nontemporal metadata
>>     to memcpy/memset
>>     >> calls. However a change like that would only have the effect
>>     of forcing the
>>     >> backend to expand !nontemporal memsets back to sequences of
>>     store instructions.
>>     >> A memset library call would not have exactly the same semantic
>>     of a builtin
>>     >> !nontemporal memset call. So, SelectionDAG will have to
>>     conservatively expand
>>     >> it back to a sequence of !nontemporal stores (effectively
>>     undoing the merging).
>>     >>
>>     >> Differential Revision:http://reviews.llvm.org/D13519
>>     >>
>>     >> Added:
>>     >>  llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll
>>     >> Modified:
>>     >>  llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp
>>     >>
>>     >> Modified: llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp
>>     >>
>>     URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp?rev=249820&r1=249819&r2=249820&view=diff
>>     >>
>>     ==============================================================================
>>     >> --- llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp
>>     (original)
>>     >> +++ llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp Fri
>>     Oct  9 05:53:41 2015
>>     >> @@ -488,6 +488,16 @@ Instruction *MemCpyOpt::tryMergingIntoMe
>>     >>    bool MemCpyOpt::processStore(StoreInst *SI,
>>     BasicBlock::iterator &BBI) {
>>     >>    if (!SI->isSimple()) return false;
>>     >> +
>>     >> +  // Avoid merging nontemporal stores since the resulting
>>     >> +  // memcpy/memset would not be able to preserve the
>>     nontemporal hint.
>>     >> +  // In theory we could teach how to propagate the
>>     !nontemporal metadata to
>>     >> +  // memset calls. However, that change would force the
>>     backend to
>>     >> +  // conservatively expand !nontemporal memset calls back to
>>     sequences of
>>     >> +  // store instructions (effectively undoing the merging).
>>     >> +  if (SI->getMetadata(LLVMContext::MD_nontemporal))
>>     >> +    return false;
>>     >> +
>>     >>    const DataLayout &DL = SI->getModule()->getDataLayout();
>>     >>      // Detect cases where we're performing call slot
>>     forwarding, but
>>     >>
>>     >> Added: llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll
>>     >>
>>     URL:http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll?rev=249820&view=auto
>>     >>
>>     ==============================================================================
>>     >> --- llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll (added)
>>     >> +++ llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll Fri
>>     Oct  9 05:53:41 2015
>>     >> @@ -0,0 +1,49 @@
>>     >> +; RUN: opt < %s -memcpyopt -S | FileCheck %s
>>     >> +
>>     >> +target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
>>     >> +
>>     >> +; Verify that we don't combine nontemporal stores into memset
>>     calls.
>>     >> +
>>     >> +define void @nontemporal_stores_1(<4 x float>* nocapture %dst) {
>>     >> +; CHECK-LABEL: @nontemporal_stores_1
>>     >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>*
>>     %dst, align 16, !nontemporal !0
>>     >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>*
>>     %ptr1, align 16, !nontemporal !0
>>     >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>*
>>     %ptr2, align 16, !nontemporal !0
>>     >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>*
>>     %ptr3, align 16, !nontemporal !0
>>     >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>*
>>     %ptr4, align 16, !nontemporal !0
>>     >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>*
>>     %ptr5, align 16, !nontemporal !0
>>     >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>*
>>     %ptr6, align 16, !nontemporal !0
>>     >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>*
>>     %ptr7, align 16, !nontemporal !0
>>     >> +; CHECK-NEXT: ret void
>>     >> +entry:
>>     >> +  store <4 x float> zeroinitializer, <4 x float>* %dst, align
>>     16, !nontemporal !0
>>     >> +  %ptr1 = getelementptr inbounds <4 x float>, <4 x float>*
>>     %dst, i64 1
>>     >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr1,
>>     align 16, !nontemporal !0
>>     >> +  %ptr2 = getelementptr inbounds <4 x float>, <4 x float>*
>>     %dst, i64 2
>>     >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr2,
>>     align 16, !nontemporal !0
>>     >> +  %ptr3 = getelementptr inbounds <4 x float>, <4 x float>*
>>     %dst, i64 3
>>     >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr3,
>>     align 16, !nontemporal !0
>>     >> +  %ptr4 = getelementptr inbounds <4 x float>, <4 x float>*
>>     %dst, i64 4
>>     >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr4,
>>     align 16, !nontemporal !0
>>     >> +  %ptr5 = getelementptr inbounds <4 x float>, <4 x float>*
>>     %dst, i64 5
>>     >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr5,
>>     align 16, !nontemporal !0
>>     >> +  %ptr6 = getelementptr inbounds <4 x float>, <4 x float>*
>>     %dst, i64 6
>>     >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr6,
>>     align 16, !nontemporal !0
>>     >> +  %ptr7 = getelementptr inbounds <4 x float>, <4 x float>*
>>     %dst, i64 7
>>     >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr7,
>>     align 16, !nontemporal !0
>>     >> +  ret void
>>     >> +}
>>     >> +
>>     >> +define void @nontemporal_stores_2(<4 x float>* nocapture %dst) {
>>     >> +; CHECK-LABEL: @nontemporal_stores_2
>>     >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>*
>>     %dst, align 16, !nontemporal !0
>>     >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>*
>>     %ptr1, align 16, !nontemporal !0
>>     >> +; CHECK-NEXT: ret void
>>     >> +entry:
>>     >> +  store <4 x float> zeroinitializer, <4 x float>* %dst, align
>>     16, !nontemporal !0
>>     >> +  %ptr1 = getelementptr inbounds <4 x float>, <4 x float>*
>>     %dst, i64 1
>>     >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr1,
>>     align 16, !nontemporal !0
>>     >> +  ret void
>>     >> +}
>>     >> +
>>     >> +!0 = !{i32 1}
>>     >>
>>     >>
>>     >> _______________________________________________
>>     >> llvm-commits mailing list
>>     >>llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
>>     >>http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>     >
>>     > _______________________________________________
>>     > llvm-commits mailing list
>>     >llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
>>     >http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>>     _______________________________________________
>>     llvm-commits mailing list
>>     llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
>>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20151009/a5390739/attachment.html>


More information about the llvm-commits mailing list