<div dir="ltr"><div>Hopefully, fixed with:<br><a href="https://reviews.llvm.org/rL329429">https://reviews.llvm.org/rL329429</a><br><br></div>Let me know if you see other problems - thanks!<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Apr 6, 2018 at 11:08 AM, Sanjay Patel <span dir="ltr"><<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span class="">On Fri, Apr 6, 2018 at 10:56 AM, <span dir="ltr"><<a href="mailto:escha@apple.com" target="_blank">escha@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space"><br><div><span><br><blockquote type="cite"><div>On Apr 6, 2018, at 9:48 AM, Sanjay Patel <<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a>> wrote:</div><br class="m_1790994278551162237m_5623741990204593743Apple-interchange-newline"><div><div dir="ltr"><div><div>Ah, thanks for letting me know. I did consider that case, but I was in an x86 mindset (and IIRC, your target has free or close-to-free fneg?), so the asm diff looks something like this:<br><br>define float @neg_sub_nsz_extra_use(float %x, float %y) {<br> %t1 = fsub float %x, %y<br> %t2 = fsub nsz float -0.0, %t1<br> %t3 = fdiv float %t1, %t2 ; just for illustration of extra use of the 1st fsub...<br> ret float %t3<br>}<br><br></div>With fneg:<br> vsubss %xmm1, %xmm0, %xmm0<br> vxorps LCPI0_0(%rip), %xmm0, %xmm1 ; load a constant to fneg...sigh<br> vdivss %xmm1, %xmm0, %xmm0<br><br></div>Replace with fsub:<br><div><div> vsubss %xmm1, %xmm0, %xmm2<br> vsubss %xmm0, %xmm1, %xmm0<br> vdivss %xmm0, %xmm2, %xmm0<br></div></div></div></div></blockquote><div><br></div></span>Yeah, we have free fneg (as do all GPUs that I know of, including AMDGPU).</div><div><br></div><div>Even on x86, I’m nervous about avoiding a xorps by that method in general; i’d expect xorps to be faster because it uses a different execution pipe, at least in some cases.</div><div><br></div><div>This transform, if done at all, should probably be done in target-specific DAG code, since it’s only profitable on a fairly specific subset of machines.</div><div><br></div><div>I also haven’t looked, but can you double check that your <span style="font-family:Menlo;font-size:11px;background-color:rgb(255,255,255)">Z - (X - Y) --> Z + (Y - X) </span>transform also checks oneuse?</div><div><br></div></div></blockquote></span><div><br>Yes - that one has m_OneUse:<br><a href="https://reviews.llvm.org/rL329362" rel="noreferrer" target="_blank">https://reviews.llvm.org/rL329<wbr>362</a><br><br></div><div>(also note that we already do that transform in instcombine, but requiring the full set of FMF)<br></div><div><br></div><div>I can merge these to reduce the code.<br></div><span class=""><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space"><div><br> </div></div></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space"><div><span><blockquote type="cite"><div><div dir="ltr"><div><div><br></div><div>So let me add the use check (I added the test already at <a href="https://reviews.llvm.org/rL329418" rel="noreferrer" target="_blank">https://reviews.llvm.org/rL329<wbr>418</a> ) to avoid the problem immediately. But maybe I should look at later target-specific (DAG) transforms that can change one to the other based on prefs?<br></div></div></div></div></blockquote><div><br></div></span><div>I’m not sure. The category of CPUs where “fsub is faster than xor” seems like a fairly Weird, Special Category. maybe just put it in the x86 backend alone?</div></div></div></blockquote><div><br></div></span><div>Yeah, it's likely a uarch- and code-specific corner-case.<br></div><div><div class="h5"><div><br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space"><div><span class="m_1790994278551162237HOEnZb"><font color="#888888"><div><br></div><div>—escha</div></font></span><div><div class="m_1790994278551162237h5"><br><blockquote type="cite"><div><div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Apr 6, 2018 at 9:33 AM, <span dir="ltr"><<a href="mailto:escha@apple.com" target="_blank">escha@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">This patch is giving us fairly significant instruction count regressions on a number of shaders. The regressions go away if we gate it with hasOneUse, with makes sense; the patch as committed is duplicating arithmetic! Is it possible to add hasOneUse checks to this? I don’t feel like it’s reasonable to duplicate fsubs just because they have a negated use, especially since the compiler as a whole is very bad at deduplicating them later.<br>
<span class="m_1790994278551162237m_5623741990204593743m_-197465863396471569HOEnZb"><font color="#888888"><br>
—escha<br>
</font></span><div class="m_1790994278551162237m_5623741990204593743m_-197465863396471569HOEnZb"><div class="m_1790994278551162237m_5623741990204593743m_-197465863396471569h5"><br>
> On Apr 5, 2018, at 2:37 PM, Sanjay Patel via llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>> wrote:<br>
><br>
> Author: spatel<br>
> Date: Thu Apr 5 14:37:17 2018<br>
> New Revision: 329350<br>
><br>
> URL: <a href="http://llvm.org/viewvc/llvm-project?rev=329350&view=rev" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject?rev=329350&view=rev</a><br>
> Log:<br>
> [InstCombine] nsz: -(X - Y) --> Y - X<br>
><br>
> This restores part of the fold that was removed with rL73243 (PR4374).<br>
><br>
> Modified:<br>
> llvm/trunk/lib/Transforms/Inst<wbr>Combine/InstCombineAddSub.cpp<br>
> llvm/trunk/test/Transforms/Ins<wbr>tCombine/fsub.ll<br>
><br>
> Modified: llvm/trunk/lib/Transforms/Inst<wbr>Combine/InstCombineAddSub.cpp<br>
> URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/InstCombine/InstCombineAddSub.cpp?rev=329350&r1=329349&r2=329350&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/lib/Transform<wbr>s/InstCombine/InstCombineAddSu<wbr>b.cpp?rev=329350&r1=329349&r2=<wbr>329350&view=diff</a><br>
> ==============================<wbr>==============================<wbr>==================<br>
> --- llvm/trunk/lib/Transforms/Inst<wbr>Combine/InstCombineAddSub.cpp (original)<br>
> +++ llvm/trunk/lib/Transforms/Inst<wbr>Combine/InstCombineAddSub.cpp Thu Apr 5 14:37:17 2018<br>
> @@ -1698,10 +1698,17 @@ Instruction *InstCombiner::visitFSub(Bin<br>
> SQ.getWithInstruction(&I)))<br>
> return replaceInstUsesWith(I, V);<br>
><br>
> - // Subtraction from -0.0 is the canonical form of fneg.<br>
> - // fsub nsz 0, X ==> fsub nsz -0.0, X<br>
> - if (I.getFastMathFlags().noSigned<wbr>Zeros() && match(Op0, m_PosZeroFP()))<br>
> - return BinaryOperator::CreateFNegFMF(<wbr>Op1, &I);<br>
> + if (I.hasNoSignedZeros()) {<br>
> + // Subtraction from -0.0 is the canonical form of fneg.<br>
> + // fsub nsz 0, X ==> fsub nsz -0.0, X<br>
> + if (match(Op0, m_PosZeroFP()))<br>
> + return BinaryOperator::CreateFNegFMF(<wbr>Op1, &I);<br>
> +<br>
> + // With no-signed-zeros: -(X - Y) --> Y - X<br>
> + Value *X, *Y;<br>
> + if (match(Op0, m_NegZeroFP()) && match(Op1, m_FSub(m_Value(X), m_Value(Y))))<br>
> + return BinaryOperator::CreateFSubFMF(<wbr>Y, X, &I);<br>
> + }<br>
><br>
> if (isa<Constant>(Op0))<br>
> if (SelectInst *SI = dyn_cast<SelectInst>(Op1))<br>
><br>
> Modified: llvm/trunk/test/Transforms/Ins<wbr>tCombine/fsub.ll<br>
> URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/InstCombine/fsub.ll?rev=329350&r1=329349&r2=329350&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-pr<wbr>oject/llvm/trunk/test/Transfor<wbr>ms/InstCombine/fsub.ll?rev=329<wbr>350&r1=329349&r2=329350&view=d<wbr>iff</a><br>
> ==============================<wbr>==============================<wbr>==================<br>
> --- llvm/trunk/test/Transforms/Ins<wbr>tCombine/fsub.ll (original)<br>
> +++ llvm/trunk/test/Transforms/Ins<wbr>tCombine/fsub.ll Thu Apr 5 14:37:17 2018<br>
> @@ -19,8 +19,7 @@ define float @test1(float %x, float %y)<br>
><br>
> define float @neg_sub(float %x, float %y) {<br>
> ; CHECK-LABEL: @neg_sub(<br>
> -; CHECK-NEXT: [[T1:%.*]] = fsub float [[X:%.*]], [[Y:%.*]]<br>
> -; CHECK-NEXT: [[T2:%.*]] = fsub nsz float -0.000000e+00, [[T1]]<br>
> +; CHECK-NEXT: [[T2:%.*]] = fsub nsz float [[Y:%.*]], [[X:%.*]]<br>
> ; CHECK-NEXT: ret float [[T2]]<br>
> ;<br>
> %t1 = fsub float %x, %y<br>
><br>
><br>
> ______________________________<wbr>_________________<br>
> llvm-commits mailing list<br>
> <a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a><br>
> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-commits</a><br>
<br>
</div></div></blockquote></div><br></div></div>
</div></blockquote></div></div></div><br></div></blockquote></div></div></div><br></div></div>
</blockquote></div><br></div>