[llvm] r329350 - [InstCombine] nsz: -(X - Y) --> Y - X

Fri Apr 6 10:31:16 PDT 2018

Hopefully, fixed with:
https://reviews.llvm.org/rL329429

Let me know if you see other problems - thanks!

On Fri, Apr 6, 2018 at 11:08 AM, Sanjay Patel <spatel at rotateright.com>
wrote:

>
>
> On Fri, Apr 6, 2018 at 10:56 AM, <escha at apple.com> wrote:
>
>>
>>
>> On Apr 6, 2018, at 9:48 AM, Sanjay Patel <spatel at rotateright.com> wrote:
>>
>> Ah, thanks for letting me know. I did consider that case, but I was in an
>> x86 mindset (and IIRC, your target has free or close-to-free fneg?), so the
>> asm diff looks something like this:
>>
>> define float @neg_sub_nsz_extra_use(float %x, float %y) {
>>   %t1 = fsub float %x, %y
>>   %t2 = fsub nsz float -0.0, %t1
>>   %t3 = fdiv float %t1, %t2 ; just for illustration of extra use of the
>> 1st fsub...
>>   ret float %t3
>> }
>>
>> With fneg:
>>     vsubss    %xmm1, %xmm0, %xmm0
>>     vxorps    LCPI0_0(%rip), %xmm0, %xmm1 ; load a constant to fneg...sigh
>>     vdivss    %xmm1, %xmm0, %xmm0
>>
>> Replace with fsub:
>>     vsubss    %xmm1, %xmm0, %xmm2
>>     vsubss    %xmm0, %xmm1, %xmm0
>>     vdivss    %xmm0, %xmm2, %xmm0
>>
>>
>> Yeah, we have free fneg (as do all GPUs that I know of, including AMDGPU).
>>
>> Even on x86, I’m nervous about avoiding a xorps by that method in
>> general; i’d expect xorps to be faster because it uses a different
>> execution pipe, at least in some cases.
>>
>> This transform, if done at all, should probably be done in
>> target-specific DAG code, since it’s only profitable on a fairly specific
>> subset of machines.
>>
>> I also haven’t looked, but can you double check that your Z - (X - Y)
>> --> Z + (Y - X) transform also checks oneuse?
>>
>>
> Yes - that one has m_OneUse:
> https://reviews.llvm.org/rL329362
>
> (also note that we already do that transform in instcombine, but requiring
> the full set of FMF)
>
> I can merge these to reduce the code.
>
>
>>
>>
>>
>
>> So let me add the use check (I added the test already at
>> https://reviews.llvm.org/rL329418 ) to avoid the problem immediately.
>> But maybe I should look at later target-specific (DAG) transforms that can
>> change one to the other based on prefs?
>>
>>
>> I’m not sure. The category of CPUs where “fsub is faster than xor” seems
>> like a fairly Weird, Special Category. maybe just put it in the x86 backend
>> alone?
>>
>
> Yeah, it's likely a uarch- and code-specific corner-case.
>
>
>
>>
>> —escha
>>
>>
>> On Fri, Apr 6, 2018 at 9:33 AM, <escha at apple.com> wrote:
>>
>>> This patch is giving us fairly significant instruction count regressions
>>> on a number of shaders. The regressions go away if we gate it with
>>> hasOneUse, with makes sense; the patch as committed is duplicating
>>> arithmetic!  Is it possible to add hasOneUse checks to this? I don’t feel
>>> like it’s reasonable to duplicate fsubs just because they have a negated
>>> use, especially since the compiler as a whole is very bad at deduplicating
>>> them later.
>>>
>>> —escha
>>>
>>> > On Apr 5, 2018, at 2:37 PM, Sanjay Patel via llvm-commits <
>>> llvm-commits at lists.llvm.org> wrote:
>>> >
>>> > Author: spatel
>>> > Date: Thu Apr  5 14:37:17 2018
>>> > New Revision: 329350
>>> >
>>> > URL: http://llvm.org/viewvc/llvm-project?rev=329350&view=rev
>>> > Log:
>>> > [InstCombine] nsz: -(X - Y) --> Y - X
>>> >
>>> > This restores part of the fold that was removed with rL73243 (PR4374).
>>> >
>>> > Modified:
>>> >    llvm/trunk/lib/Transforms/InstCombine/InstCombineAddSub.cpp
>>> >    llvm/trunk/test/Transforms/InstCombine/fsub.ll
>>> >
>>> > Modified: llvm/trunk/lib/Transforms/InstCombine/InstCombineAddSub.cpp
>>> > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transform
>>> s/InstCombine/InstCombineAddSub.cpp?rev=329350&r1=329349&r2=
>>> 329350&view=diff
>>> > ============================================================
>>> ==================
>>> > --- llvm/trunk/lib/Transforms/InstCombine/InstCombineAddSub.cpp
>>> (original)
>>> > +++ llvm/trunk/lib/Transforms/InstCombine/InstCombineAddSub.cpp Thu
>>> Apr  5 14:37:17 2018
>>> > @@ -1698,10 +1698,17 @@ Instruction *InstCombiner::visitFSub(Bin
>>> >                                   SQ.getWithInstruction(&I)))
>>> >     return replaceInstUsesWith(I, V);
>>> >
>>> > -  // Subtraction from -0.0 is the canonical form of fneg.
>>> > -  // fsub nsz 0, X ==> fsub nsz -0.0, X
>>> > -  if (I.getFastMathFlags().noSignedZeros() && match(Op0,
>>> m_PosZeroFP()))
>>> > -    return BinaryOperator::CreateFNegFMF(Op1, &I);
>>> > +  if (I.hasNoSignedZeros()) {
>>> > +    // Subtraction from -0.0 is the canonical form of fneg.
>>> > +    // fsub nsz 0, X ==> fsub nsz -0.0, X
>>> > +    if (match(Op0, m_PosZeroFP()))
>>> > +      return BinaryOperator::CreateFNegFMF(Op1, &I);
>>> > +
>>> > +    // With no-signed-zeros: -(X - Y) --> Y - X
>>> > +    Value *X, *Y;
>>> > +    if (match(Op0, m_NegZeroFP()) && match(Op1, m_FSub(m_Value(X),
>>> m_Value(Y))))
>>> > +      return BinaryOperator::CreateFSubFMF(Y, X, &I);
>>> > +  }
>>> >
>>> >   if (isa<Constant>(Op0))
>>> >     if (SelectInst *SI = dyn_cast<SelectInst>(Op1))
>>> >
>>> > Modified: llvm/trunk/test/Transforms/InstCombine/fsub.ll
>>> > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transfor
>>> ms/InstCombine/fsub.ll?rev=329350&r1=329349&r2=329350&view=diff
>>> > ============================================================
>>> ==================
>>> > --- llvm/trunk/test/Transforms/InstCombine/fsub.ll (original)
>>> > +++ llvm/trunk/test/Transforms/InstCombine/fsub.ll Thu Apr  5
>>> 14:37:17 2018
>>> > @@ -19,8 +19,7 @@ define float @test1(float %x, float %y)
>>> >
>>> > define float @neg_sub(float %x, float %y) {
>>> > ; CHECK-LABEL: @neg_sub(
>>> > -; CHECK-NEXT:    [[T1:%.*]] = fsub float [[X:%.*]], [[Y:%.*]]
>>> > -; CHECK-NEXT:    [[T2:%.*]] = fsub nsz float -0.000000e+00, [[T1]]
>>> > +; CHECK-NEXT:    [[T2:%.*]] = fsub nsz float [[Y:%.*]], [[X:%.*]]
>>> > ; CHECK-NEXT:    ret float [[T2]]
>>> > ;
>>> >   %t1 = fsub float %x, %y
>>> >
>>> >
>>> > _______________________________________________
>>> > llvm-commits mailing list
>>> > llvm-commits at lists.llvm.org
>>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180406/e8b98630/attachment.html>