<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
I don't have a strong enough opinion here to push the issue, but I
think I still disagree with the chosen approach. However, I will
defer to you two since you seem to have discussed the tradeoffs and
are better positioned to make the decision.<br>
<br>
p.s. If the patch to support non-temporal memsets was simple, it
might be interesting in it's own right. Even if we don't
canonicalize to it, having the ability to chose it in the frontend
could be interesting. Note the "could"; I have no concrete use for
this in the near future.<br>
<br>
Philip<br>
<br>
<br>
<br>
<div class="moz-cite-prefix">On 10/09/2015 11:15 AM, Quentin
Colombet wrote:<br>
</div>
<blockquote
cite="mid:24AE2798-B369-4AB6-A34F-583BB97C59BD@apple.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
Hi Andrea,
<div class=""><br class="">
<div>
<blockquote type="cite" class="">
<div class="">On Oct 9, 2015, at 10:58 AM, Andrea Di Biagio
<<a moz-do-not-send="true"
href="mailto:andrea.dibiagio@gmail.com" class="">andrea.dibiagio@gmail.com</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<div class=""><br class="Apple-interchange-newline">
<br style="font-family: Helvetica; font-size: 12px;
font-style: normal; font-variant: normal; font-weight:
normal; letter-spacing: normal; orphans: auto;
text-align: start; text-indent: 0px; text-transform:
none; white-space: normal; widows: auto; word-spacing:
0px; -webkit-text-stroke-width: 0px;" class="">
<div class="gmail_quote" style="font-family: Helvetica;
font-size: 12px; font-style: normal; font-variant:
normal; font-weight: normal; letter-spacing: normal;
orphans: auto; text-align: start; text-indent: 0px;
text-transform: none; white-space: normal; widows: auto;
word-spacing: 0px; -webkit-text-stroke-width: 0px;">On
Fri, Oct 9, 2015 at 6:09 PM, Quentin Colombet via
llvm-commits<span class="Apple-converted-space"> </span><span
dir="ltr" class=""><<a moz-do-not-send="true"
href="mailto:llvm-commits@lists.llvm.org"
target="_blank" class="">llvm-commits@lists.llvm.org</a>></span><span
class="Apple-converted-space"> </span>wrote:<br
class="">
<blockquote class="gmail_quote" style="margin: 0px 0px
0px 0.8ex; border-left-width: 1px; border-left-style:
solid; border-left-color: rgb(204, 204, 204);
padding-left: 1ex;">Hi Philip,<br class="">
<span class=""></span><span class=""><br class="">
><br class="">
> One of my worries is about how disabling this
will effect other optimizations around non-temporal
stores. In particular, if we had a memset following
the stores, we'd no longer have the chance to
combine them into a single memset. Replacing two
non-temporal stores and a large memset with a larger
memset seems highly likely to be profitable.<br
class="">
<br class="">
</span>I am not sure about that. I would expect any
memset implementation to touch the cache and thus, to
have an impact on the performance of the surrounding
code.<br class="">
</blockquote>
<div class=""><br class="">
</div>
<div class="">I agree with Quentin here. I don't think
it would be better since you would end up polluting
the caches.<br class="">
<br class="">
</div>
<blockquote class="gmail_quote" style="margin: 0px 0px
0px 0.8ex; border-left-width: 1px; border-left-style:
solid; border-left-color: rgb(204, 204, 204);
padding-left: 1ex;"><span class=""><br class="">
><br class="">
> Analogously, replacing 8 adjacent non-temporal
byte stores with a single memset (and then i64
store) also seems profitable and possibly disabled
by this transform.<br class="">
><br class="">
> Also, if we keep this change, the comment needs
to be extended to highlight this is a profitability
decision, not a legality one.<br class="">
<br class="">
</span>Agree with that part.<br class="">
</blockquote>
<div class=""><br class="">
</div>
<div class="">Adjacent stores like the ones you
described should be easily handled by
'MergeConsecutiveStores' in the DAGCombiner. So, I
don't think this transformation is pessimizing the
codegen for that case. MergeConsecutiveStores should
be able to merge those small stores and still preserve
the nontemporal hint.<br class="">
</div>
<div class=""><br class="">
</div>
<div class="">Overall I am not completely against the
idea of propagating the !nontemporal metadata to
memset/memcpy (I actually have a patch that implements
that for memset - if people really prefer that
approach I can upload it for review). However, at the
moment I don't see the advantage in doing it (I'd like
to see an example where that transformation would be
profitable).<br class="">
</div>
</div>
</div>
</blockquote>
<div><br class="">
</div>
<div>The thing I was agreeing with was that the comment should
say that we block the transformation not because we couldn’t
fix memset, but because it is not worth it given we would
need to split the memset later. That was how I read the
profitability part, though I guess Philip was more into what
you understood :).</div>
<div><br class="">
</div>
<div>Cheers,</div>
<div>Q.</div>
<br class="">
<blockquote type="cite" class="">
<div class="">
<div class="gmail_quote" style="font-family: Helvetica;
font-size: 12px; font-style: normal; font-variant:
normal; font-weight: normal; letter-spacing: normal;
orphans: auto; text-align: start; text-indent: 0px;
text-transform: none; white-space: normal; widows: auto;
word-spacing: 0px; -webkit-text-stroke-width: 0px;">
<div class=""><br class="">
-Andrea<br class="">
</div>
<div class=""><br class="">
Thanks,<br class="">
</div>
<blockquote class="gmail_quote" style="margin: 0px 0px
0px 0.8ex; border-left-width: 1px; border-left-style:
solid; border-left-color: rgb(204, 204, 204);
padding-left: 1ex;">-Quentin<br class="">
<div class="">
<div class="h5"><br class="">
><br class="">
> Philip<br class="">
><br class="">
> On 10/09/2015 03:53 AM, Andrea Di Biagio via
llvm-commits wrote:<br class="">
>> Author: adibiagio<br class="">
>> Date: Fri Oct 9 05:53:41 2015<br
class="">
>> New Revision: 249820<br class="">
>><br class="">
>> URL:<span class="Apple-converted-space"> </span><a
moz-do-not-send="true"
href="http://llvm.org/viewvc/llvm-project?rev=249820&view=rev"
rel="noreferrer" target="_blank" class=""><a class="moz-txt-link-freetext" href="http://llvm.org/viewvc/llvm-project?rev=249820&view=rev">http://llvm.org/viewvc/llvm-project?rev=249820&view=rev</a></a><br
class="">
>> Log:<br class="">
>> [MemCpyOpt] Fix wrong merging adjacent
nontemporal stores into memset calls.<br class="">
>><br class="">
>> Pass MemCpyOpt doesn't check if a store
instruction is nontemporal.<br class="">
>> As a consequence, adjacent nontemporal
stores are always merged into a<br class="">
>> memset call.<br class="">
>><br class="">
>> Example:<br class="">
>><br class="">
>> ;;;<br class="">
>> define void @foo(<4 x float>*
nocapture %p) {<br class="">
>> entry:<br class="">
>> store <4 x float>
zeroinitializer, <4 x float>* %p, align 16,
!nontemporal !0<br class="">
>> %p1 = getelementptr inbounds <4 x
float>, <4 x float>* %dst, i64 1<br
class="">
>> store <4 x float>
zeroinitializer, <4 x float>* %p1, align 16,
!nontemporal !0<br class="">
>> ret void<br class="">
>> }<br class="">
>><br class="">
>> !0 = !{i32 1}<br class="">
>> ;;;<br class="">
>><br class="">
>> In this example, the two nontemporal
stores are combined to a memset of zero<br
class="">
>> which does not preserve the nontemporal
hint. Later on the backend (tested on a<br
class="">
>> x86-64 corei7) expands that memset call
into a sequence of two normal 16-byte<br class="">
>> aligned vector stores.<br class="">
>><br class="">
>> opt -memcpyopt example.ll -S -o - | llc
-mcpu=corei7 -o -<br class="">
>><br class="">
>> Before:<br class="">
>> xorps %xmm0, %xmm0<br class="">
>> movaps %xmm0, 16(%rdi)<br class="">
>> movaps %xmm0, (%rdi)<br class="">
>><br class="">
>> With this patch, we no longer merge
nontemporal stores into calls to memset.<br
class="">
>> In this example, llc correctly expands
the two stores into two movntps:<br class="">
>> xorps %xmm0, %xmm0<br class="">
>> movntps %xmm0, 16(%rdi)<br class="">
>> movntps %xmm0, (%rdi)<br class="">
>><br class="">
>> In theory, we could extend the usage of
!nontemporal metadata to memcpy/memset<br class="">
>> calls. However a change like that would
only have the effect of forcing the<br class="">
>> backend to expand !nontemporal memsets
back to sequences of store instructions.<br
class="">
>> A memset library call would not have
exactly the same semantic of a builtin<br class="">
>> !nontemporal memset call. So,
SelectionDAG will have to conservatively expand<br
class="">
>> it back to a sequence of !nontemporal
stores (effectively undoing the merging).<br
class="">
>><br class="">
>> Differential Revision:<span
class="Apple-converted-space"> </span><a
moz-do-not-send="true"
href="http://reviews.llvm.org/D13519"
rel="noreferrer" target="_blank" class=""><a class="moz-txt-link-freetext" href="http://reviews.llvm.org/D13519">http://reviews.llvm.org/D13519</a></a><br
class="">
>><br class="">
>> Added:<br class="">
>>
llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll<br
class="">
>> Modified:<br class="">
>>
llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp<br
class="">
>><br class="">
>> Modified:
llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp<br
class="">
>> URL:<span class="Apple-converted-space"> </span><a
moz-do-not-send="true"
href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp?rev=249820&r1=249819&r2=249820&view=diff"
rel="noreferrer" target="_blank" class=""><a class="moz-txt-link-freetext" href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp?rev=249820&r1=249819&r2=249820&view=diff">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp?rev=249820&r1=249819&r2=249820&view=diff</a></a><br
class="">
>>
==============================================================================<br
class="">
>> ---
llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp
(original)<br class="">
>> +++
llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp
Fri Oct 9 05:53:41 2015<br class="">
>> @@ -488,6 +488,16 @@ Instruction
*MemCpyOpt::tryMergingIntoMe<br class="">
>> bool MemCpyOpt::processStore(StoreInst
*SI, BasicBlock::iterator &BBI) {<br class="">
>> if (!SI->isSimple()) return false;<br
class="">
>> +<br class="">
>> + // Avoid merging nontemporal stores
since the resulting<br class="">
>> + // memcpy/memset would not be able to
preserve the nontemporal hint.<br class="">
>> + // In theory we could teach how to
propagate the !nontemporal metadata to<br class="">
>> + // memset calls. However, that change
would force the backend to<br class="">
>> + // conservatively expand !nontemporal
memset calls back to sequences of<br class="">
>> + // store instructions (effectively
undoing the merging).<br class="">
>> + if
(SI->getMetadata(LLVMContext::MD_nontemporal))<br
class="">
>> + return false;<br class="">
>> +<br class="">
>> const DataLayout &DL =
SI->getModule()->getDataLayout();<br
class="">
>> // Detect cases where we're
performing call slot forwarding, but<br class="">
>><br class="">
>> Added:
llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll<br
class="">
>> URL:<span class="Apple-converted-space"> </span><a
moz-do-not-send="true"
href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll?rev=249820&view=auto"
rel="noreferrer" target="_blank" class=""><a class="moz-txt-link-freetext" href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll?rev=249820&view=auto">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll?rev=249820&view=auto</a></a><br
class="">
>>
==============================================================================<br
class="">
>> ---
llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll
(added)<br class="">
>> +++
llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll
Fri Oct 9 05:53:41 2015<br class="">
>> @@ -0,0 +1,49 @@<br class="">
>> +; RUN: opt < %s -memcpyopt -S |
FileCheck %s<br class="">
>> +<br class="">
>> +target datalayout =
"e-m:e-i64:64-f80:128-n8:16:32:64-S128"<br
class="">
>> +<br class="">
>> +; Verify that we don't combine
nontemporal stores into memset calls.<br class="">
>> +<br class="">
>> +define void @nontemporal_stores_1(<4
x float>* nocapture %dst) {<br class="">
>> +; CHECK-LABEL: @nontemporal_stores_1<br
class="">
>> +; CHECK: store <4 x float>
zeroinitializer, <4 x float>* %dst, align
16, !nontemporal !0<br class="">
>> +; CHECK: store <4 x float>
zeroinitializer, <4 x float>* %ptr1, align
16, !nontemporal !0<br class="">
>> +; CHECK: store <4 x float>
zeroinitializer, <4 x float>* %ptr2, align
16, !nontemporal !0<br class="">
>> +; CHECK: store <4 x float>
zeroinitializer, <4 x float>* %ptr3, align
16, !nontemporal !0<br class="">
>> +; CHECK: store <4 x float>
zeroinitializer, <4 x float>* %ptr4, align
16, !nontemporal !0<br class="">
>> +; CHECK: store <4 x float>
zeroinitializer, <4 x float>* %ptr5, align
16, !nontemporal !0<br class="">
>> +; CHECK: store <4 x float>
zeroinitializer, <4 x float>* %ptr6, align
16, !nontemporal !0<br class="">
>> +; CHECK: store <4 x float>
zeroinitializer, <4 x float>* %ptr7, align
16, !nontemporal !0<br class="">
>> +; CHECK-NEXT: ret void<br class="">
>> +entry:<br class="">
>> + store <4 x float>
zeroinitializer, <4 x float>* %dst, align
16, !nontemporal !0<br class="">
>> + %ptr1 = getelementptr inbounds <4 x
float>, <4 x float>* %dst, i64 1<br
class="">
>> + store <4 x float>
zeroinitializer, <4 x float>* %ptr1, align
16, !nontemporal !0<br class="">
>> + %ptr2 = getelementptr inbounds <4 x
float>, <4 x float>* %dst, i64 2<br
class="">
>> + store <4 x float>
zeroinitializer, <4 x float>* %ptr2, align
16, !nontemporal !0<br class="">
>> + %ptr3 = getelementptr inbounds <4 x
float>, <4 x float>* %dst, i64 3<br
class="">
>> + store <4 x float>
zeroinitializer, <4 x float>* %ptr3, align
16, !nontemporal !0<br class="">
>> + %ptr4 = getelementptr inbounds <4 x
float>, <4 x float>* %dst, i64 4<br
class="">
>> + store <4 x float>
zeroinitializer, <4 x float>* %ptr4, align
16, !nontemporal !0<br class="">
>> + %ptr5 = getelementptr inbounds <4 x
float>, <4 x float>* %dst, i64 5<br
class="">
>> + store <4 x float>
zeroinitializer, <4 x float>* %ptr5, align
16, !nontemporal !0<br class="">
>> + %ptr6 = getelementptr inbounds <4 x
float>, <4 x float>* %dst, i64 6<br
class="">
>> + store <4 x float>
zeroinitializer, <4 x float>* %ptr6, align
16, !nontemporal !0<br class="">
>> + %ptr7 = getelementptr inbounds <4 x
float>, <4 x float>* %dst, i64 7<br
class="">
>> + store <4 x float>
zeroinitializer, <4 x float>* %ptr7, align
16, !nontemporal !0<br class="">
>> + ret void<br class="">
>> +}<br class="">
>> +<br class="">
>> +define void @nontemporal_stores_2(<4
x float>* nocapture %dst) {<br class="">
>> +; CHECK-LABEL: @nontemporal_stores_2<br
class="">
>> +; CHECK: store <4 x float>
zeroinitializer, <4 x float>* %dst, align
16, !nontemporal !0<br class="">
>> +; CHECK: store <4 x float>
zeroinitializer, <4 x float>* %ptr1, align
16, !nontemporal !0<br class="">
>> +; CHECK-NEXT: ret void<br class="">
>> +entry:<br class="">
>> + store <4 x float>
zeroinitializer, <4 x float>* %dst, align
16, !nontemporal !0<br class="">
>> + %ptr1 = getelementptr inbounds <4 x
float>, <4 x float>* %dst, i64 1<br
class="">
>> + store <4 x float>
zeroinitializer, <4 x float>* %ptr1, align
16, !nontemporal !0<br class="">
>> + ret void<br class="">
>> +}<br class="">
>> +<br class="">
>> +!0 = !{i32 1}<br class="">
>><br class="">
>><br class="">
>>
_______________________________________________<br
class="">
>> llvm-commits mailing list<br class="">
>><span class="Apple-converted-space"> </span><a
moz-do-not-send="true"
href="mailto:llvm-commits@lists.llvm.org"
class=""><a class="moz-txt-link-abbreviated" href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a></a><br
class="">
>><span class="Apple-converted-space"> </span><a
moz-do-not-send="true"
href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits"
rel="noreferrer" target="_blank" class=""><a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a></a><br
class="">
><br class="">
>
_______________________________________________<br
class="">
> llvm-commits mailing list<br class="">
><span class="Apple-converted-space"> </span><a
moz-do-not-send="true"
href="mailto:llvm-commits@lists.llvm.org"
class=""><a class="moz-txt-link-abbreviated" href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a></a><br
class="">
><span class="Apple-converted-space"> </span><a
moz-do-not-send="true"
href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits"
rel="noreferrer" target="_blank" class=""><a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a></a><br
class="">
<br class="">
_______________________________________________<br
class="">
llvm-commits mailing list<br class="">
<a moz-do-not-send="true"
href="mailto:llvm-commits@lists.llvm.org"
class="">llvm-commits@lists.llvm.org</a><br
class="">
<a moz-do-not-send="true"
href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits"
rel="noreferrer" target="_blank" class="">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a></div>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</blockquote>
<br>
</body>
</html>