[llvm] r249820 - [MemCpyOpt] Fix wrong merging adjacent nontemporal stores into memset calls.

Quentin Colombet via llvm-commits llvm-commits at lists.llvm.org
Fri Oct 9 14:40:23 PDT 2015


> On Oct 9, 2015, at 2:24 PM, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:
> 
> Hi Philip,
> 
> I have attached the prototype memset patch that I mentioned before.
> 
> The patch is not complicated and would allows to merge nontemporal stores together.
> There are two important implications:
> 1) It will _not_ try to mix temporal stores with nontemporal stores. If people really really want the other behavior then we can think of adding a flag as a future extension.
> 2) Large memsets would still generate library calls. On X86, we would follow the usual rules (i.e.: memset is unrolled up to 16 stores; otherwise we generate a libcall).
> 
> Point 2. is sub-optimal. However, I don't think it would occur often in practice to have to expand very big nontemporal memsets. Also, for very large memsets, the library implementation should be (hopefully!) smart enough to use a specialized loop of nontemporal stores.

How do you envision something like that to be possible?
AFAICT you cannot pass the non temporal information to the library.

> 
> As a side note: the patch would require a change in the nontemporal.ll test.
> I think it will also require extra x86 tests to make sure that the nontemporal memset is expanded to a sequence of movnt instructions.
> 
> In conclusion:
> do you (Philip and Quentin) think this approach would be a good compromise?

That would work for me, though I was already (and am still :)) happy!

Cheers,
-Quentin

> For what is worth, this approach would still address most of our customer concerns (i.e. it is probably as good as the previous patch).
> If you prefer this approach more then I am okay with sending it for review (especially if this approach makes everybody happy :-) ).
> 
> Cheers,
> Andrea
> 
> 
> On Fri, Oct 9, 2015 at 8:51 PM, Philip Reames <listmail at philipreames.com <mailto:listmail at philipreames.com>> wrote:
> I don't have a strong enough opinion here to push the issue, but I think I still disagree with the chosen approach.  However, I will defer to you two since you seem to have discussed the tradeoffs and are better positioned to make the decision.
> 
> p.s. If the patch to support non-temporal memsets was simple, it might be interesting in it's own right.  Even if we don't canonicalize to it, having the ability to chose it in the frontend could be interesting.  Note the "could"; I have no concrete use for this in the near future.
> 
> Philip
> 
> 
> 
> 
> On 10/09/2015 11:15 AM, Quentin Colombet wrote:
>> Hi Andrea,
>> 
>>> On Oct 9, 2015, at 10:58 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com <mailto:andrea.dibiagio at gmail.com>> wrote:
>>> 
>>> 
>>> 
>>> On Fri, Oct 9, 2015 at 6:09 PM, Quentin Colombet via llvm-commits <llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>> wrote:
>>> Hi Philip,
>>> 
>>> >
>>> > One of my worries is about how disabling this will effect other optimizations around non-temporal stores.  In particular, if we had a memset following the stores, we'd no longer have the chance to combine them into a single memset.  Replacing two non-temporal stores and a large memset with a larger memset seems highly likely to be profitable.
>>> 
>>> I am not sure about that. I would expect any memset implementation to touch the cache and thus, to have an impact on the performance of the surrounding code.
>>> 
>>> I agree with Quentin here. I don't think it would be better since you would end up polluting the caches.
>>> 
>>> 
>>> >
>>> > Analogously, replacing 8 adjacent non-temporal byte stores with a single memset (and then i64 store) also seems profitable and possibly disabled by this transform.
>>> >
>>> > Also, if we keep this change, the comment needs to be extended to highlight this is a profitability decision, not a legality one.
>>> 
>>> Agree with that part.
>>> 
>>> Adjacent stores like the ones you described should be easily handled by 'MergeConsecutiveStores' in the DAGCombiner. So, I don't think this transformation is pessimizing the codegen for that case. MergeConsecutiveStores should be able to merge those small stores and still preserve the nontemporal hint.
>>> 
>>> Overall I am not completely against the idea of propagating the !nontemporal metadata to memset/memcpy (I actually have a patch that implements that for memset - if people really prefer that approach I can upload it for review). However, at the moment I don't see the advantage in doing it (I'd like to see an example where that transformation would be profitable).
>> 
>> The thing I was agreeing with was that the comment should say that we block the transformation not because we couldn’t fix memset, but because it is not worth it given we would need to split the memset later. That was how I read the profitability part, though I guess Philip was more into what you understood :).
>> 
>> Cheers,
>> Q.
>> 
>>> 
>>> -Andrea
>>> 
>>> Thanks,
>>> -Quentin
>>> 
>>> >
>>> > Philip
>>> >
>>> > On 10/09/2015 03:53 AM, Andrea Di Biagio via llvm-commits wrote:
>>> >> Author: adibiagio
>>> >> Date: Fri Oct  9 05:53:41 2015
>>> >> New Revision: 249820
>>> >>
>>> >> URL:  <http://llvm.org/viewvc/llvm-project?rev=249820&view=rev>http://llvm.org/viewvc/llvm-project?rev=249820&view=rev <http://llvm.org/viewvc/llvm-project?rev=249820&view=rev>
>>> >> Log:
>>> >> [MemCpyOpt] Fix wrong merging adjacent nontemporal stores into memset calls.
>>> >>
>>> >> Pass MemCpyOpt doesn't check if a store instruction is nontemporal.
>>> >> As a consequence, adjacent nontemporal stores are always merged into a
>>> >> memset call.
>>> >>
>>> >> Example:
>>> >>
>>> >> ;;;
>>> >> define void @foo(<4 x float>* nocapture %p) {
>>> >> entry:
>>> >>   store <4 x float> zeroinitializer, <4 x float>* %p, align 16, !nontemporal !0
>>> >>   %p1 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 1
>>> >>   store <4 x float> zeroinitializer, <4 x float>* %p1, align 16, !nontemporal !0
>>> >>   ret void
>>> >> }
>>> >>
>>> >> !0 = !{i32 1}
>>> >> ;;;
>>> >>
>>> >> In this example, the two nontemporal stores are combined to a memset of zero
>>> >> which does not preserve the nontemporal hint. Later on the backend (tested on a
>>> >> x86-64 corei7) expands that memset call into a sequence of two normal 16-byte
>>> >> aligned vector stores.
>>> >>
>>> >> opt -memcpyopt example.ll -S -o - | llc -mcpu=corei7 -o -
>>> >>
>>> >> Before:
>>> >>   xorps  %xmm0, %xmm0
>>> >>   movaps  %xmm0, 16(%rdi)
>>> >>   movaps  %xmm0, (%rdi)
>>> >>
>>> >> With this patch, we no longer merge nontemporal stores into calls to memset.
>>> >> In this example, llc correctly expands the two stores into two movntps:
>>> >>   xorps  %xmm0, %xmm0
>>> >>   movntps %xmm0, 16(%rdi)
>>> >>   movntps  %xmm0, (%rdi)
>>> >>
>>> >> In theory, we could extend the usage of !nontemporal metadata to memcpy/memset
>>> >> calls. However a change like that would only have the effect of forcing the
>>> >> backend to expand !nontemporal memsets back to sequences of store instructions.
>>> >> A memset library call would not have exactly the same semantic of a builtin
>>> >> !nontemporal memset call. So, SelectionDAG will have to conservatively expand
>>> >> it back to a sequence of !nontemporal stores (effectively undoing the merging).
>>> >>
>>> >> Differential Revision:  <http://reviews.llvm.org/D13519>http://reviews.llvm.org/D13519 <http://reviews.llvm.org/D13519>
>>> >>
>>> >> Added:
>>> >>     llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll
>>> >> Modified:
>>> >>     llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp
>>> >>
>>> >> Modified: llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp
>>> >> URL:  <http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp?rev=249820&r1=249819&r2=249820&view=diff>http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp?rev=249820&r1=249819&r2=249820&view=diff <http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp?rev=249820&r1=249819&r2=249820&view=diff>
>>> >> ==============================================================================
>>> >> --- llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp (original)
>>> >> +++ llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp Fri Oct  9 05:53:41 2015
>>> >> @@ -488,6 +488,16 @@ Instruction *MemCpyOpt::tryMergingIntoMe
>>> >>    bool MemCpyOpt::processStore(StoreInst *SI, BasicBlock::iterator &BBI) {
>>> >>    if (!SI->isSimple()) return false;
>>> >> +
>>> >> +  // Avoid merging nontemporal stores since the resulting
>>> >> +  // memcpy/memset would not be able to preserve the nontemporal hint.
>>> >> +  // In theory we could teach how to propagate the !nontemporal metadata to
>>> >> +  // memset calls. However, that change would force the backend to
>>> >> +  // conservatively expand !nontemporal memset calls back to sequences of
>>> >> +  // store instructions (effectively undoing the merging).
>>> >> +  if (SI->getMetadata(LLVMContext::MD_nontemporal))
>>> >> +    return false;
>>> >> +
>>> >>    const DataLayout &DL = SI->getModule()->getDataLayout();
>>> >>      // Detect cases where we're performing call slot forwarding, but
>>> >>
>>> >> Added: llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll
>>> >> URL:  <http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll?rev=249820&view=auto>http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll?rev=249820&view=auto <http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll?rev=249820&view=auto>
>>> >> ==============================================================================
>>> >> --- llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll (added)
>>> >> +++ llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll Fri Oct  9 05:53:41 2015
>>> >> @@ -0,0 +1,49 @@
>>> >> +; RUN: opt < %s -memcpyopt -S | FileCheck %s
>>> >> +
>>> >> +target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
>>> >> +
>>> >> +; Verify that we don't combine nontemporal stores into memset calls.
>>> >> +
>>> >> +define void @nontemporal_stores_1(<4 x float>* nocapture %dst) {
>>> >> +; CHECK-LABEL: @nontemporal_stores_1
>>> >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %dst, align 16, !nontemporal !0
>>> >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr1, align 16, !nontemporal !0
>>> >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr2, align 16, !nontemporal !0
>>> >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr3, align 16, !nontemporal !0
>>> >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr4, align 16, !nontemporal !0
>>> >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr5, align 16, !nontemporal !0
>>> >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr6, align 16, !nontemporal !0
>>> >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr7, align 16, !nontemporal !0
>>> >> +; CHECK-NEXT: ret void
>>> >> +entry:
>>> >> +  store <4 x float> zeroinitializer, <4 x float>* %dst, align 16, !nontemporal !0
>>> >> +  %ptr1 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 1
>>> >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr1, align 16, !nontemporal !0
>>> >> +  %ptr2 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 2
>>> >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr2, align 16, !nontemporal !0
>>> >> +  %ptr3 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 3
>>> >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr3, align 16, !nontemporal !0
>>> >> +  %ptr4 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 4
>>> >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr4, align 16, !nontemporal !0
>>> >> +  %ptr5 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 5
>>> >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr5, align 16, !nontemporal !0
>>> >> +  %ptr6 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 6
>>> >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr6, align 16, !nontemporal !0
>>> >> +  %ptr7 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 7
>>> >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr7, align 16, !nontemporal !0
>>> >> +  ret void
>>> >> +}
>>> >> +
>>> >> +define void @nontemporal_stores_2(<4 x float>* nocapture %dst) {
>>> >> +; CHECK-LABEL: @nontemporal_stores_2
>>> >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %dst, align 16, !nontemporal !0
>>> >> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr1, align 16, !nontemporal !0
>>> >> +; CHECK-NEXT: ret void
>>> >> +entry:
>>> >> +  store <4 x float> zeroinitializer, <4 x float>* %dst, align 16, !nontemporal !0
>>> >> +  %ptr1 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 1
>>> >> +  store <4 x float> zeroinitializer, <4 x float>* %ptr1, align 16, !nontemporal !0
>>> >> +  ret void
>>> >> +}
>>> >> +
>>> >> +!0 = !{i32 1}
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> llvm-commits mailing list
>>> >>  <mailto:llvm-commits at lists.llvm.org>llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
>>> >>  <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
>>> >
>>> > _______________________________________________
>>> > llvm-commits mailing list
>>> >  <mailto:llvm-commits at lists.llvm.org>llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
>>> >  <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
>>> 
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
> 
> 
> <patch-alternative-solution-nontemporal.diff>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20151009/1574958f/attachment.html>


More information about the llvm-commits mailing list