<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Oct 9, 2015 at 10:40 PM, Quentin Colombet <span dir="ltr"><<a href="mailto:qcolombet@apple.com" target="_blank">qcolombet@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word"><br><div><span class=""><blockquote type="cite"><div>On Oct 9, 2015, at 2:24 PM, Andrea Di Biagio <<a href="mailto:andrea.dibiagio@gmail.com" target="_blank">andrea.dibiagio@gmail.com</a>> wrote:</div><br><div><div dir="ltr"><div><div>Hi Philip,<br><br></div>I have attached the prototype memset patch that I mentioned before.<br></div><div><br>The patch is not complicated and would allows to merge nontemporal stores together.<br>There are two important implications:<br>1)
It will _not_ try to mix temporal stores with nontemporal stores. If
people really really want the other behavior then we can think of adding
a flag as a future extension.<br></div><div>2) Large memsets would
still generate library calls. On X86, we would follow the usual rules
(i.e.: memset is unrolled up to 16 stores; otherwise we generate a
libcall). </div><div><br>Point 2. is sub-optimal. However, I don't think
it would occur often in practice to have to expand very big nontemporal
memsets. Also, for very large memsets, the library implementation
should be (hopefully!) smart enough to use a specialized loop of
nontemporal stores.<br></div></div></div></blockquote><div><br></div></span><div>How do you envision something like that to be possible?</div><div>AFAICT you cannot pass the non temporal information to the library.</div></div></div></blockquote><div><br><div>You cannot pass that information to the library.<br></div><div>What I meant is that library memcpy/memset functions tend to be heavily optimized for the target. For example, you could have a smart specialized variant of memset that uses nontemporal stores for very large memsets.<br>That said, I don't like when the compiler speculates too much on what a library call would do. I prefer a more conservative approach like my original patch.<br><br></div></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word"><div><span class=""><br><blockquote type="cite"><div><div dir="ltr"><div><br></div><div>As a side note: the patch would require a change in the nontemporal.ll test.<br></div><div>I
think it will also require extra x86 tests to make sure that the
nontemporal memset is expanded to a sequence of movnt instructions.<br></div><div><br></div><div>In conclusion:<br></div><div>do you (Philip and Quentin) think this approach would be a good compromise?<br></div></div></div></blockquote><div><br></div></span><div>That would work for me, though I was already (and am still :)) happy!</div></div></div></blockquote><div><br></div><div>Don't get me wrong, I am very happy as well!<br></div><div>Since I mentioned that I had an alternative patch, I thought it would have been bad not to share it.<br><br></div><div>-Andrea<br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word"><div><div><br></div><div>Cheers,</div><div>-Quentin</div><br><blockquote type="cite"><div><div><div class="h5"><div dir="ltr"><div>For
what is worth, this approach would still address most of our customer
concerns (i.e. it is probably as good as the previous patch).<br>If you
prefer this approach more then I am okay with sending it for review
(especially if this approach makes everybody happy :-) ).<br><br></div><div>Cheers,<br></div>Andrea<div><div><img src="https://ssl.gstatic.com/ui/v1/icons/mail/images/cleardot.gif"></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Oct 9, 2015 at 8:51 PM, Philip Reames <span dir="ltr"><<a href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
I don't have a strong enough opinion here to push the issue, but I
think I still disagree with the chosen approach. However, I will
defer to you two since you seem to have discussed the tradeoffs and
are better positioned to make the decision.<br>
<br>
p.s. If the patch to support non-temporal memsets was simple, it
might be interesting in it's own right. Even if we don't
canonicalize to it, having the ability to chose it in the frontend
could be interesting. Note the "could"; I have no concrete use for
this in the near future.<br>
<br>
Philip<div><div><br>
<br>
<br>
<br>
<div>On 10/09/2015 11:15 AM, Quentin
Colombet wrote:<br>
</div>
<blockquote type="cite">
Hi Andrea,
<div><br>
<div>
<blockquote type="cite">
<div>On Oct 9, 2015, at 10:58 AM, Andrea Di Biagio
<<a href="mailto:andrea.dibiagio@gmail.com" target="_blank">andrea.dibiagio@gmail.com</a>>
wrote:</div>
<br>
<div><br>
<br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<div class="gmail_quote" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">On
Fri, Oct 9, 2015 at 6:09 PM, Quentin Colombet via
llvm-commits<span> </span><span dir="ltr"><<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>></span><span> </span>wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Philip,<br>
<span></span><span><br>
><br>
> One of my worries is about how disabling this
will effect other optimizations around non-temporal
stores. In particular, if we had a memset following
the stores, we'd no longer have the chance to
combine them into a single memset. Replacing two
non-temporal stores and a large memset with a larger
memset seems highly likely to be profitable.<br>
<br>
</span>I am not sure about that. I would expect any
memset implementation to touch the cache and thus, to
have an impact on the performance of the surrounding
code.<br>
</blockquote>
<div><br>
</div>
<div>I agree with Quentin here. I don't think
it would be better since you would end up polluting
the caches.<br>
<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span><br>
><br>
> Analogously, replacing 8 adjacent non-temporal
byte stores with a single memset (and then i64
store) also seems profitable and possibly disabled
by this transform.<br>
><br>
> Also, if we keep this change, the comment needs
to be extended to highlight this is a profitability
decision, not a legality one.<br>
<br>
</span>Agree with that part.<br>
</blockquote>
<div><br>
</div>
<div>Adjacent stores like the ones you
described should be easily handled by
'MergeConsecutiveStores' in the DAGCombiner. So, I
don't think this transformation is pessimizing the
codegen for that case. MergeConsecutiveStores should
be able to merge those small stores and still preserve
the nontemporal hint.<br>
</div>
<div><br>
</div>
<div>Overall I am not completely against the
idea of propagating the !nontemporal metadata to
memset/memcpy (I actually have a patch that implements
that for memset - if people really prefer that
approach I can upload it for review). However, at the
moment I don't see the advantage in doing it (I'd like
to see an example where that transformation would be
profitable).<br>
</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>The thing I was agreeing with was that the comment should
say that we block the transformation not because we couldn’t
fix memset, but because it is not worth it given we would
need to split the memset later. That was how I read the
profitability part, though I guess Philip was more into what
you understood :).</div>
<div><br>
</div>
<div>Cheers,</div>
<div>Q.</div>
<br>
<blockquote type="cite">
<div>
<div class="gmail_quote" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<div><br>
-Andrea<br>
</div>
<div><br>
Thanks,<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">-Quentin<br>
<div>
<div><br>
><br>
> Philip<br>
><br>
> On 10/09/2015 03:53 AM, Andrea Di Biagio via
llvm-commits wrote:<br>
>> Author: adibiagio<br>
>> Date: Fri Oct 9 05:53:41 2015<br>
>> New Revision: 249820<br>
>><br>
>> URL:<span> </span><a href="http://llvm.org/viewvc/llvm-project?rev=249820&view=rev" rel="noreferrer" target="_blank"></a><a href="http://llvm.org/viewvc/llvm-project?rev=249820&view=rev" target="_blank">http://llvm.org/viewvc/llvm-project?rev=249820&view=rev</a><br>
>> Log:<br>
>> [MemCpyOpt] Fix wrong merging adjacent
nontemporal stores into memset calls.<br>
>><br>
>> Pass MemCpyOpt doesn't check if a store
instruction is nontemporal.<br>
>> As a consequence, adjacent nontemporal
stores are always merged into a<br>
>> memset call.<br>
>><br>
>> Example:<br>
>><br>
>> ;;;<br>
>> define void @foo(<4 x float>*
nocapture %p) {<br>
>> entry:<br>
>> store <4 x float>
zeroinitializer, <4 x float>* %p, align 16,
!nontemporal !0<br>
>> %p1 = getelementptr inbounds <4 x
float>, <4 x float>* %dst, i64 1<br>
>> store <4 x float>
zeroinitializer, <4 x float>* %p1, align 16,
!nontemporal !0<br>
>> ret void<br>
>> }<br>
>><br>
>> !0 = !{i32 1}<br>
>> ;;;<br>
>><br>
>> In this example, the two nontemporal
stores are combined to a memset of zero<br>
>> which does not preserve the nontemporal
hint. Later on the backend (tested on a<br>
>> x86-64 corei7) expands that memset call
into a sequence of two normal 16-byte<br>
>> aligned vector stores.<br>
>><br>
>> opt -memcpyopt example.ll -S -o - | llc
-mcpu=corei7 -o -<br>
>><br>
>> Before:<br>
>> xorps %xmm0, %xmm0<br>
>> movaps %xmm0, 16(%rdi)<br>
>> movaps %xmm0, (%rdi)<br>
>><br>
>> With this patch, we no longer merge
nontemporal stores into calls to memset.<br>
>> In this example, llc correctly expands
the two stores into two movntps:<br>
>> xorps %xmm0, %xmm0<br>
>> movntps %xmm0, 16(%rdi)<br>
>> movntps %xmm0, (%rdi)<br>
>><br>
>> In theory, we could extend the usage of
!nontemporal metadata to memcpy/memset<br>
>> calls. However a change like that would
only have the effect of forcing the<br>
>> backend to expand !nontemporal memsets
back to sequences of store instructions.<br>
>> A memset library call would not have
exactly the same semantic of a builtin<br>
>> !nontemporal memset call. So,
SelectionDAG will have to conservatively expand<br>
>> it back to a sequence of !nontemporal
stores (effectively undoing the merging).<br>
>><br>
>> Differential Revision:<span> </span><a href="http://reviews.llvm.org/D13519" rel="noreferrer" target="_blank"></a><a href="http://reviews.llvm.org/D13519" target="_blank">http://reviews.llvm.org/D13519</a><br>
>><br>
>> Added:<br>
>>
llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll<br>
>> Modified:<br>
>>
llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp<br>
>><br>
>> Modified:
llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp<br>
>> URL:<span> </span><a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp?rev=249820&r1=249819&r2=249820&view=diff" rel="noreferrer" target="_blank"></a><a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp?rev=249820&r1=249819&r2=249820&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp?rev=249820&r1=249819&r2=249820&view=diff</a><br>
>>
==============================================================================<br>
>> ---
llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp
(original)<br>
>> +++
llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp
Fri Oct 9 05:53:41 2015<br>
>> @@ -488,6 +488,16 @@ Instruction
*MemCpyOpt::tryMergingIntoMe<br>
>> bool MemCpyOpt::processStore(StoreInst
*SI, BasicBlock::iterator &BBI) {<br>
>> if (!SI->isSimple()) return false;<br>
>> +<br>
>> + // Avoid merging nontemporal stores
since the resulting<br>
>> + // memcpy/memset would not be able to
preserve the nontemporal hint.<br>
>> + // In theory we could teach how to
propagate the !nontemporal metadata to<br>
>> + // memset calls. However, that change
would force the backend to<br>
>> + // conservatively expand !nontemporal
memset calls back to sequences of<br>
>> + // store instructions (effectively
undoing the merging).<br>
>> + if
(SI->getMetadata(LLVMContext::MD_nontemporal))<br>
>> + return false;<br>
>> +<br>
>> const DataLayout &DL =
SI->getModule()->getDataLayout();<br>
>> // Detect cases where we're
performing call slot forwarding, but<br>
>><br>
>> Added:
llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll<br>
>> URL:<span> </span><a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll?rev=249820&view=auto" rel="noreferrer" target="_blank"></a><a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll?rev=249820&view=auto" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll?rev=249820&view=auto</a><br>
>>
==============================================================================<br>
>> ---
llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll
(added)<br>
>> +++
llvm/trunk/test/Transforms/MemCpyOpt/nontemporal.ll
Fri Oct 9 05:53:41 2015<br>
>> @@ -0,0 +1,49 @@<br>
>> +; RUN: opt < %s -memcpyopt -S |
FileCheck %s<br>
>> +<br>
>> +target datalayout =
"e-m:e-i64:64-f80:128-n8:16:32:64-S128"<br>
>> +<br>
>> +; Verify that we don't combine
nontemporal stores into memset calls.<br>
>> +<br>
>> +define void @nontemporal_stores_1(<4
x float>* nocapture %dst) {<br>
>> +; CHECK-LABEL: @nontemporal_stores_1<br>
>> +; CHECK: store <4 x float>
zeroinitializer, <4 x float>* %dst, align
16, !nontemporal !0<br>
>> +; CHECK: store <4 x float>
zeroinitializer, <4 x float>* %ptr1, align
16, !nontemporal !0<br>
>> +; CHECK: store <4 x float>
zeroinitializer, <4 x float>* %ptr2, align
16, !nontemporal !0<br>
>> +; CHECK: store <4 x float>
zeroinitializer, <4 x float>* %ptr3, align
16, !nontemporal !0<br>
>> +; CHECK: store <4 x float>
zeroinitializer, <4 x float>* %ptr4, align
16, !nontemporal !0<br>
>> +; CHECK: store <4 x float>
zeroinitializer, <4 x float>* %ptr5, align
16, !nontemporal !0<br>
>> +; CHECK: store <4 x float>
zeroinitializer, <4 x float>* %ptr6, align
16, !nontemporal !0<br>
>> +; CHECK: store <4 x float>
zeroinitializer, <4 x float>* %ptr7, align
16, !nontemporal !0<br>
>> +; CHECK-NEXT: ret void<br>
>> +entry:<br>
>> + store <4 x float>
zeroinitializer, <4 x float>* %dst, align
16, !nontemporal !0<br>
>> + %ptr1 = getelementptr inbounds <4 x
float>, <4 x float>* %dst, i64 1<br>
>> + store <4 x float>
zeroinitializer, <4 x float>* %ptr1, align
16, !nontemporal !0<br>
>> + %ptr2 = getelementptr inbounds <4 x
float>, <4 x float>* %dst, i64 2<br>
>> + store <4 x float>
zeroinitializer, <4 x float>* %ptr2, align
16, !nontemporal !0<br>
>> + %ptr3 = getelementptr inbounds <4 x
float>, <4 x float>* %dst, i64 3<br>
>> + store <4 x float>
zeroinitializer, <4 x float>* %ptr3, align
16, !nontemporal !0<br>
>> + %ptr4 = getelementptr inbounds <4 x
float>, <4 x float>* %dst, i64 4<br>
>> + store <4 x float>
zeroinitializer, <4 x float>* %ptr4, align
16, !nontemporal !0<br>
>> + %ptr5 = getelementptr inbounds <4 x
float>, <4 x float>* %dst, i64 5<br>
>> + store <4 x float>
zeroinitializer, <4 x float>* %ptr5, align
16, !nontemporal !0<br>
>> + %ptr6 = getelementptr inbounds <4 x
float>, <4 x float>* %dst, i64 6<br>
>> + store <4 x float>
zeroinitializer, <4 x float>* %ptr6, align
16, !nontemporal !0<br>
>> + %ptr7 = getelementptr inbounds <4 x
float>, <4 x float>* %dst, i64 7<br>
>> + store <4 x float>
zeroinitializer, <4 x float>* %ptr7, align
16, !nontemporal !0<br>
>> + ret void<br>
>> +}<br>
>> +<br>
>> +define void @nontemporal_stores_2(<4
x float>* nocapture %dst) {<br>
>> +; CHECK-LABEL: @nontemporal_stores_2<br>
>> +; CHECK: store <4 x float>
zeroinitializer, <4 x float>* %dst, align
16, !nontemporal !0<br>
>> +; CHECK: store <4 x float>
zeroinitializer, <4 x float>* %ptr1, align
16, !nontemporal !0<br>
>> +; CHECK-NEXT: ret void<br>
>> +entry:<br>
>> + store <4 x float>
zeroinitializer, <4 x float>* %dst, align
16, !nontemporal !0<br>
>> + %ptr1 = getelementptr inbounds <4 x
float>, <4 x float>* %dst, i64 1<br>
>> + store <4 x float>
zeroinitializer, <4 x float>* %ptr1, align
16, !nontemporal !0<br>
>> + ret void<br>
>> +}<br>
>> +<br>
>> +!0 = !{i32 1}<br>
>><br>
>><br>
>>
_______________________________________________<br>
>> llvm-commits mailing list<br>
>><span> </span><a href="mailto:llvm-commits@lists.llvm.org" target="_blank"></a><a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a><br>
>><span> </span><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank"></a><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a><br>
><br>
>
_______________________________________________<br>
> llvm-commits mailing list<br>
><span> </span><a href="mailto:llvm-commits@lists.llvm.org" target="_blank"></a><a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a><br>
><span> </span><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank"></a><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a><br>
<br>
_______________________________________________<br>
llvm-commits mailing list<br>
<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a></div>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</div></div></div>
</blockquote></div><br></div>
</div></div><span><patch-alternative-solution-nontemporal.diff></span></div></blockquote></div><br></div></blockquote></div><br></div></div>