[llvm] r249820 - [MemCpyOpt] Fix wrong merging adjacent nontemporal stores into memset calls.

Quentin Colombet via llvm-commits llvm-commits at lists.llvm.org
Fri Oct 9 10:09:30 PDT 2015


Hi Philip,

> On Oct 9, 2015, at 9:51 AM, Philip Reames via llvm-commits <llvm-commits at lists.llvm.org> wrote:
> 
> I'm not sure this patch is moving in the right direction.  In particular, I find the fact that adding the non-temporal metadata is restricting optimization to be very problematic.  As far as I can tell, the transform being disabled was entirely legal; it just wasn't profitable in this example. Extending non-temporal to memset calls seems like a much better scenario.

We thought about extending non-temporal to memset but decided against it. Indeed, for memset to be able to honor the non-temporal hints, this is likely it needs to split back to loads and stores operations, since there is no way a memset implementation in some library can be aware of the metadata.

> 
> One of my worries is about how disabling this will effect other optimizations around non-temporal stores.  In particular, if we had a memset following the stores, we'd no longer have the chance to combine them into a single memset.  Replacing two non-temporal stores and a large memset with a larger memset seems highly likely to be profitable.

I am not sure about that. I would expect any memset implementation to touch the cache and thus, to have an impact on the performance of the surrounding code.

> 
> Analogously, replacing 8 adjacent non-temporal byte stores with a single memset (and then i64 store) also seems profitable and possibly disabled by this transform.
> 
> Also, if we keep this change, the comment needs to be extended to highlight this is a profitability decision, not a legality one.

Agree with that part.

Thanks,
-Quentin

> 
> Philip
> 
> On 10/09/2015 03:53 AM, Andrea Di Biagio via llvm-commits wrote:
>> Author: adibiagio
>> Date: Fri Oct  9 05:53:41 2015
>> New Revision: 249820
>> 
>> URL: http://llvm.org/viewvc/llvm-project?rev=249820&view=rev
>> Log:
>> [MemCpyOpt] Fix wrong merging adjacent nontemporal stores into memset calls.
>> 
>> Pass MemCpyOpt doesn't check if a store instruction is nontemporal.
>> As a consequence, adjacent nontemporal stores are always merged into a
>> memset call.
>> 
>> Example:
>> 
>> ;;;
>> define void @foo(<4 x float>* nocapture %p) {
>> entry:
>>   store <4 x float> zeroinitializer, <4 x float>* %p, align 16, !nontemporal !0
>>   %p1 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 1
>>   store <4 x float> zeroinitializer, <4 x float>* %p1, align 16, !nontemporal !0
>>   ret void
>> }
>> 
>> !0 = !{i32 1}
>> ;;;
>> 
>> In this example, the two nontemporal stores are combined to a memset of zero
>> which does not preserve the nontemporal hint. Later on the backend (tested on a
>> x86-64 corei7) expands that memset call into a sequence of two normal 16-byte
>> aligned vector stores.
>> 
>> opt -memcpyopt example.ll -S -o - | llc -mcpu=corei7 -o -
>> 
>> Before:
>>   xorps  %xmm0, %xmm0
>>   movaps  %xmm0, 16(%rdi)
>>   movaps  %xmm0, (%rdi)
>> 
>> With this patch, we no longer merge nontemporal stores into calls to memset.
>> In this example, llc correctly expands the two stores into two movntps:
>>   xorps  %xmm0, %xmm0
>>   movntps %xmm0, 16(%rdi)
>>   movntps  %xmm0, (%rdi)
>> 
>> In theory, we could extend the usage of !nontemporal metadata to memcpy/memset
>> calls. However a change like that would only have the effect of forcing the
>> backend to expand !nontemporal memsets back to sequences of store instructions.
>> A memset library call would not have exactly the same semantic of a builtin
>> !nontemporal memset call. So, SelectionDAG will have to conservatively expand
>> it back to a sequence of !nontemporal stores (effectively undoing the merging).
>> 
>> Differential Revision: http://reviews.llvm.org/D13519
>> 
>> Added:
>>     llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll
>> Modified:
>>     llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp
>> 
>> Modified: llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp?rev=249820&r1=249819&r2=249820&view=diff
>> ==============================================================================
>> --- llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp (original)
>> +++ llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp Fri Oct  9 05:53:41 2015
>> @@ -488,6 +488,16 @@ Instruction *MemCpyOpt::tryMergingIntoMe
>>    bool MemCpyOpt::processStore(StoreInst *SI, BasicBlock::iterator &BBI) {
>>    if (!SI->isSimple()) return false;
>> +
>> +  // Avoid merging nontemporal stores since the resulting
>> +  // memcpy/memset would not be able to preserve the nontemporal hint.
>> +  // In theory we could teach how to propagate the !nontemporal metadata to
>> +  // memset calls. However, that change would force the backend to
>> +  // conservatively expand !nontemporal memset calls back to sequences of
>> +  // store instructions (effectively undoing the merging).
>> +  if (SI->getMetadata(LLVMContext::MD_nontemporal))
>> +    return false;
>> +
>>    const DataLayout &DL = SI->getModule()->getDataLayout();
>>      // Detect cases where we're performing call slot forwarding, but
>> 
>> Added: llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll?rev=249820&view=auto
>> ==============================================================================
>> --- llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll (added)
>> +++ llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll Fri Oct  9 05:53:41 2015
>> @@ -0,0 +1,49 @@
>> +; RUN: opt < %s -memcpyopt -S | FileCheck %s
>> +
>> +target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
>> +
>> +; Verify that we don't combine nontemporal stores into memset calls.
>> +
>> +define void @nontemporal_stores_1(<4 x float>* nocapture %dst) {
>> +; CHECK-LABEL: @nontemporal_stores_1
>> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %dst, align 16, !nontemporal !0
>> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr1, align 16, !nontemporal !0
>> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr2, align 16, !nontemporal !0
>> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr3, align 16, !nontemporal !0
>> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr4, align 16, !nontemporal !0
>> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr5, align 16, !nontemporal !0
>> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr6, align 16, !nontemporal !0
>> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr7, align 16, !nontemporal !0
>> +; CHECK-NEXT: ret void
>> +entry:
>> +  store <4 x float> zeroinitializer, <4 x float>* %dst, align 16, !nontemporal !0
>> +  %ptr1 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 1
>> +  store <4 x float> zeroinitializer, <4 x float>* %ptr1, align 16, !nontemporal !0
>> +  %ptr2 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 2
>> +  store <4 x float> zeroinitializer, <4 x float>* %ptr2, align 16, !nontemporal !0
>> +  %ptr3 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 3
>> +  store <4 x float> zeroinitializer, <4 x float>* %ptr3, align 16, !nontemporal !0
>> +  %ptr4 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 4
>> +  store <4 x float> zeroinitializer, <4 x float>* %ptr4, align 16, !nontemporal !0
>> +  %ptr5 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 5
>> +  store <4 x float> zeroinitializer, <4 x float>* %ptr5, align 16, !nontemporal !0
>> +  %ptr6 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 6
>> +  store <4 x float> zeroinitializer, <4 x float>* %ptr6, align 16, !nontemporal !0
>> +  %ptr7 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 7
>> +  store <4 x float> zeroinitializer, <4 x float>* %ptr7, align 16, !nontemporal !0
>> +  ret void
>> +}
>> +
>> +define void @nontemporal_stores_2(<4 x float>* nocapture %dst) {
>> +; CHECK-LABEL: @nontemporal_stores_2
>> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %dst, align 16, !nontemporal !0
>> +; CHECK: store <4 x float> zeroinitializer, <4 x float>* %ptr1, align 16, !nontemporal !0
>> +; CHECK-NEXT: ret void
>> +entry:
>> +  store <4 x float> zeroinitializer, <4 x float>* %dst, align 16, !nontemporal !0
>> +  %ptr1 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 1
>> +  store <4 x float> zeroinitializer, <4 x float>* %ptr1, align 16, !nontemporal !0
>> +  ret void
>> +}
>> +
>> +!0 = !{i32 1}
>> 
>> 
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits



More information about the llvm-commits mailing list