<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
I definitely like this approach better, but again, I'll leave the
actual choice to you two. I think you're better positioned to
understand all the implications and use cases. <br>
<br>
If you do chose to go with this approach, separating it into two
patches would be helpful. Patch one could add the backend support
for a memset with the !nontemporal metadata. Patch two would create
them. I can't really help with patch 1, but I can help review patch
2 if desired. (Figure I should volunteer some help if I'm asking
you to do extra work.) <br>
<br>
Philip<br>
<br>
<div class="moz-cite-prefix">On 10/09/2015 02:24 PM, Andrea Di
Biagio wrote:<br>
</div>
<blockquote
cite="mid:CAMw=4mKDjT_E46M033XDk6FFaqRQqpqMX=B_1DxR0Ce7_c_feQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>
<div>Hi Philip,<br>
<br>
</div>
I have attached the prototype memset patch that I mentioned
before.<br>
</div>
<div><br>
The patch is not complicated and would allows to merge
nontemporal stores together.<br>
There are two important implications:<br>
1) It will _not_ try to mix temporal stores with nontemporal
stores. If people really really want the other behavior then
we can think of adding a flag as a future extension.<br>
</div>
<div>2) Large memsets would still generate library calls. On
X86, we would follow the usual rules (i.e.: memset is unrolled
up to 16 stores; otherwise we generate a libcall). </div>
<div><br>
Point 2. is sub-optimal. However, I don't think it would occur
often in practice to have to expand very big nontemporal
memsets. Also, for very large memsets, the library
implementation should be (hopefully!) smart enough to use a
specialized loop of nontemporal stores.<br>
<br>
</div>
<div>As a side note: the patch would require a change in the
nontemporal.ll test.<br>
</div>
<div>I think it will also require extra x86 tests to make sure
that the nontemporal memset is expanded to a sequence of movnt
instructions.<br>
</div>
<div><br>
</div>
<div>In conclusion:<br>
</div>
<div>do you (Philip and Quentin) think this approach would be a
good compromise?<br>
</div>
<div>For what is worth, this approach would still address most
of our customer concerns (i.e. it is probably as good as the
previous patch).<br>
If you prefer this approach more then I am okay with sending
it for review (especially if this approach makes everybody
happy :-) ).<br>
<br>
</div>
<div>Cheers,<br>
</div>
Andrea
<div class="">
<div id=":20h" class="" tabindex="0"><img
moz-do-not-send="true" class=""
src="https://ssl.gstatic.com/ui/v1/icons/mail/images/cleardot.gif"></div>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Fri, Oct 9, 2015 at 8:51 PM, Philip
Reames <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"> I don't have a strong
enough opinion here to push the issue, but I think I still
disagree with the chosen approach. However, I will defer
to you two since you seem to have discussed the tradeoffs
and are better positioned to make the decision.<br>
<br>
p.s. If the patch to support non-temporal memsets was
simple, it might be interesting in it's own right. Even
if we don't canonicalize to it, having the ability to
chose it in the frontend could be interesting. Note the
"could"; I have no concrete use for this in the near
future.<br>
<br>
Philip
<div>
<div class="h5"><br>
<br>
<br>
<br>
<div>On 10/09/2015 11:15 AM, Quentin Colombet wrote:<br>
</div>
<blockquote type="cite"> Hi Andrea,
<div><br>
<div>
<blockquote type="cite">
<div>On Oct 9, 2015, at 10:58 AM, Andrea Di
Biagio <<a moz-do-not-send="true"
href="mailto:andrea.dibiagio@gmail.com"
target="_blank">andrea.dibiagio@gmail.com</a>>
wrote:</div>
<br>
<div><br>
<br
style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<div class="gmail_quote"
style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">On
Fri, Oct 9, 2015 at 6:09 PM, Quentin
Colombet via llvm-commits<span> </span><span
dir="ltr"><<a moz-do-not-send="true"
href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>></span><span> </span>wrote:<br>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">Hi
Philip,<br>
<span></span><span><br>
><br>
> One of my worries is about how
disabling this will effect other
optimizations around non-temporal
stores. In particular, if we had a
memset following the stores, we'd no
longer have the chance to combine them
into a single memset. Replacing two
non-temporal stores and a large memset
with a larger memset seems highly
likely to be profitable.<br>
<br>
</span>I am not sure about that. I would
expect any memset implementation to
touch the cache and thus, to have an
impact on the performance of the
surrounding code.<br>
</blockquote>
<div><br>
</div>
<div>I agree with Quentin here. I don't
think it would be better since you would
end up polluting the caches.<br>
<br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><span><br>
><br>
> Analogously, replacing 8 adjacent
non-temporal byte stores with a single
memset (and then i64 store) also seems
profitable and possibly disabled by
this transform.<br>
><br>
> Also, if we keep this change, the
comment needs to be extended to
highlight this is a profitability
decision, not a legality one.<br>
<br>
</span>Agree with that part.<br>
</blockquote>
<div><br>
</div>
<div>Adjacent stores like the ones you
described should be easily handled by
'MergeConsecutiveStores' in the
DAGCombiner. So, I don't think this
transformation is pessimizing the
codegen for that case.
MergeConsecutiveStores should be able to
merge those small stores and still
preserve the nontemporal hint.<br>
</div>
<div><br>
</div>
<div>Overall I am not completely against
the idea of propagating the !nontemporal
metadata to memset/memcpy (I actually
have a patch that implements that for
memset - if people really prefer that
approach I can upload it for review).
However, at the moment I don't see the
advantage in doing it (I'd like to see
an example where that transformation
would be profitable).<br>
</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>The thing I was agreeing with was that the
comment should say that we block the
transformation not because we couldn’t fix
memset, but because it is not worth it given
we would need to split the memset later. That
was how I read the profitability part, though
I guess Philip was more into what you
understood :).</div>
<div><br>
</div>
<div>Cheers,</div>
<div>Q.</div>
<br>
<blockquote type="cite">
<div>
<div class="gmail_quote"
style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<div><br>
-Andrea<br>
</div>
<div><br>
Thanks,<br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">-Quentin<br>
<div>
<div><br>
><br>
> Philip<br>
><br>
> On 10/09/2015 03:53 AM, Andrea
Di Biagio via llvm-commits wrote:<br>
>> Author: adibiagio<br>
>> Date: Fri Oct 9 05:53:41
2015<br>
>> New Revision: 249820<br>
>><br>
>> URL:<span> </span><a
moz-do-not-send="true"
href="http://llvm.org/viewvc/llvm-project?rev=249820&view=rev"
target="_blank"><a class="moz-txt-link-freetext" href="http://llvm.org/viewvc/llvm-project?rev=249820&view=rev">http://llvm.org/viewvc/llvm-project?rev=249820&view=rev</a></a><br>
>> Log:<br>
>> [MemCpyOpt] Fix wrong
merging adjacent nontemporal stores
into memset calls.<br>
>><br>
>> Pass MemCpyOpt doesn't
check if a store instruction is
nontemporal.<br>
>> As a consequence, adjacent
nontemporal stores are always merged
into a<br>
>> memset call.<br>
>><br>
>> Example:<br>
>><br>
>> ;;;<br>
>> define void @foo(<4 x
float>* nocapture %p) {<br>
>> entry:<br>
>> store <4 x float>
zeroinitializer, <4 x float>*
%p, align 16, !nontemporal !0<br>
>> %p1 = getelementptr
inbounds <4 x float>, <4 x
float>* %dst, i64 1<br>
>> store <4 x float>
zeroinitializer, <4 x float>*
%p1, align 16, !nontemporal !0<br>
>> ret void<br>
>> }<br>
>><br>
>> !0 = !{i32 1}<br>
>> ;;;<br>
>><br>
>> In this example, the two
nontemporal stores are combined to a
memset of zero<br>
>> which does not preserve the
nontemporal hint. Later on the
backend (tested on a<br>
>> x86-64 corei7) expands that
memset call into a sequence of two
normal 16-byte<br>
>> aligned vector stores.<br>
>><br>
>> opt -memcpyopt example.ll
-S -o - | llc -mcpu=corei7 -o -<br>
>><br>
>> Before:<br>
>> xorps %xmm0, %xmm0<br>
>> movaps %xmm0, 16(%rdi)<br>
>> movaps %xmm0, (%rdi)<br>
>><br>
>> With this patch, we no
longer merge nontemporal stores into
calls to memset.<br>
>> In this example, llc
correctly expands the two stores
into two movntps:<br>
>> xorps %xmm0, %xmm0<br>
>> movntps %xmm0, 16(%rdi)<br>
>> movntps %xmm0, (%rdi)<br>
>><br>
>> In theory, we could extend
the usage of !nontemporal metadata
to memcpy/memset<br>
>> calls. However a change
like that would only have the effect
of forcing the<br>
>> backend to expand
!nontemporal memsets back to
sequences of store instructions.<br>
>> A memset library call would
not have exactly the same semantic
of a builtin<br>
>> !nontemporal memset call.
So, SelectionDAG will have to
conservatively expand<br>
>> it back to a sequence of
!nontemporal stores (effectively
undoing the merging).<br>
>><br>
>> Differential Revision:<span> </span><a
moz-do-not-send="true"
href="http://reviews.llvm.org/D13519"
target="_blank"><a class="moz-txt-link-freetext" href="http://reviews.llvm.org/D13519">http://reviews.llvm.org/D13519</a></a><br>
>><br>
>> Added:<br>
>>
llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll<br>
>> Modified:<br>
>>
llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp<br>
>><br>
>> Modified:
llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp<br>
>> URL:<span> </span><a
moz-do-not-send="true"
href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp?rev=249820&r1=249819&r2=249820&view=diff"
target="_blank"><a class="moz-txt-link-freetext" href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp?rev=249820&r1=249819&r2=249820&view=diff">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp?rev=249820&r1=249819&r2=249820&view=diff</a></a><br>
>>
==============================================================================<br>
>> ---
llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp
(original)<br>
>> +++
llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp
Fri Oct 9 05:53:41 2015<br>
>> @@ -488,6 +488,16 @@
Instruction
*MemCpyOpt::tryMergingIntoMe<br>
>> bool
MemCpyOpt::processStore(StoreInst
*SI, BasicBlock::iterator &BBI)
{<br>
>> if (!SI->isSimple())
return false;<br>
>> +<br>
>> + // Avoid merging
nontemporal stores since the
resulting<br>
>> + // memcpy/memset would
not be able to preserve the
nontemporal hint.<br>
>> + // In theory we could
teach how to propagate the
!nontemporal metadata to<br>
>> + // memset calls.
However, that change would force the
backend to<br>
>> + // conservatively expand
!nontemporal memset calls back to
sequences of<br>
>> + // store instructions
(effectively undoing the merging).<br>
>> + if
(SI->getMetadata(LLVMContext::MD_nontemporal))<br>
>> + return false;<br>
>> +<br>
>> const DataLayout &DL
=
SI->getModule()->getDataLayout();<br>
>> // Detect cases where
we're performing call slot
forwarding, but<br>
>><br>
>> Added:
llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll<br>
>> URL:<span> </span><a
moz-do-not-send="true"
href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll?rev=249820&view=auto"
target="_blank"><a class="moz-txt-link-freetext" href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll?rev=249820&view=auto">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll?rev=249820&view=auto</a></a><br>
>>
==============================================================================<br>
>> ---
llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll
(added)<br>
>> +++
llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll
Fri Oct 9 05:53:41 2015<br>
>> @@ -0,0 +1,49 @@<br>
>> +; RUN: opt < %s
-memcpyopt -S | FileCheck %s<br>
>> +<br>
>> +target datalayout =
"e-m:e-i64:64-f80:128-n8:16:32:64-S128"<br>
>> +<br>
>> +; Verify that we don't
combine nontemporal stores into
memset calls.<br>
>> +<br>
>> +define void
@nontemporal_stores_1(<4 x
float>* nocapture %dst) {<br>
>> +; CHECK-LABEL:
@nontemporal_stores_1<br>
>> +; CHECK: store <4 x
float> zeroinitializer, <4 x
float>* %dst, align 16,
!nontemporal !0<br>
>> +; CHECK: store <4 x
float> zeroinitializer, <4 x
float>* %ptr1, align 16,
!nontemporal !0<br>
>> +; CHECK: store <4 x
float> zeroinitializer, <4 x
float>* %ptr2, align 16,
!nontemporal !0<br>
>> +; CHECK: store <4 x
float> zeroinitializer, <4 x
float>* %ptr3, align 16,
!nontemporal !0<br>
>> +; CHECK: store <4 x
float> zeroinitializer, <4 x
float>* %ptr4, align 16,
!nontemporal !0<br>
>> +; CHECK: store <4 x
float> zeroinitializer, <4 x
float>* %ptr5, align 16,
!nontemporal !0<br>
>> +; CHECK: store <4 x
float> zeroinitializer, <4 x
float>* %ptr6, align 16,
!nontemporal !0<br>
>> +; CHECK: store <4 x
float> zeroinitializer, <4 x
float>* %ptr7, align 16,
!nontemporal !0<br>
>> +; CHECK-NEXT: ret void<br>
>> +entry:<br>
>> + store <4 x float>
zeroinitializer, <4 x float>*
%dst, align 16, !nontemporal !0<br>
>> + %ptr1 = getelementptr
inbounds <4 x float>, <4 x
float>* %dst, i64 1<br>
>> + store <4 x float>
zeroinitializer, <4 x float>*
%ptr1, align 16, !nontemporal !0<br>
>> + %ptr2 = getelementptr
inbounds <4 x float>, <4 x
float>* %dst, i64 2<br>
>> + store <4 x float>
zeroinitializer, <4 x float>*
%ptr2, align 16, !nontemporal !0<br>
>> + %ptr3 = getelementptr
inbounds <4 x float>, <4 x
float>* %dst, i64 3<br>
>> + store <4 x float>
zeroinitializer, <4 x float>*
%ptr3, align 16, !nontemporal !0<br>
>> + %ptr4 = getelementptr
inbounds <4 x float>, <4 x
float>* %dst, i64 4<br>
>> + store <4 x float>
zeroinitializer, <4 x float>*
%ptr4, align 16, !nontemporal !0<br>
>> + %ptr5 = getelementptr
inbounds <4 x float>, <4 x
float>* %dst, i64 5<br>
>> + store <4 x float>
zeroinitializer, <4 x float>*
%ptr5, align 16, !nontemporal !0<br>
>> + %ptr6 = getelementptr
inbounds <4 x float>, <4 x
float>* %dst, i64 6<br>
>> + store <4 x float>
zeroinitializer, <4 x float>*
%ptr6, align 16, !nontemporal !0<br>
>> + %ptr7 = getelementptr
inbounds <4 x float>, <4 x
float>* %dst, i64 7<br>
>> + store <4 x float>
zeroinitializer, <4 x float>*
%ptr7, align 16, !nontemporal !0<br>
>> + ret void<br>
>> +}<br>
>> +<br>
>> +define void
@nontemporal_stores_2(<4 x
float>* nocapture %dst) {<br>
>> +; CHECK-LABEL:
@nontemporal_stores_2<br>
>> +; CHECK: store <4 x
float> zeroinitializer, <4 x
float>* %dst, align 16,
!nontemporal !0<br>
>> +; CHECK: store <4 x
float> zeroinitializer, <4 x
float>* %ptr1, align 16,
!nontemporal !0<br>
>> +; CHECK-NEXT: ret void<br>
>> +entry:<br>
>> + store <4 x float>
zeroinitializer, <4 x float>*
%dst, align 16, !nontemporal !0<br>
>> + %ptr1 = getelementptr
inbounds <4 x float>, <4 x
float>* %dst, i64 1<br>
>> + store <4 x float>
zeroinitializer, <4 x float>*
%ptr1, align 16, !nontemporal !0<br>
>> + ret void<br>
>> +}<br>
>> +<br>
>> +!0 = !{i32 1}<br>
>><br>
>><br>
>>
_______________________________________________<br>
>> llvm-commits mailing list<br>
>><span> </span><a
moz-do-not-send="true"
href="mailto:llvm-commits@lists.llvm.org"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a></a><br>
>><span> </span><a
moz-do-not-send="true"
href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits"
target="_blank"><a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a></a><br>
><br>
>
_______________________________________________<br>
> llvm-commits mailing list<br>
><span> </span><a
moz-do-not-send="true"
href="mailto:llvm-commits@lists.llvm.org"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a></a><br>
><span> </span><a
moz-do-not-send="true"
href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits"
target="_blank"><a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a></a><br>
<br>
_______________________________________________<br>
llvm-commits mailing list<br>
<a moz-do-not-send="true"
href="mailto:llvm-commits@lists.llvm.org"
target="_blank">llvm-commits@lists.llvm.org</a><br>
<a moz-do-not-send="true"
href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits"
rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a></div>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</body>
</html>