[llvm] r329350 - [InstCombine] nsz: -(X - Y) --> Y - X

Sat Apr 7 07:55:59 PDT 2018

Thanks so much!

—escha

> On Apr 6, 2018, at 10:31 AM, Sanjay Patel <spatel at rotateright.com> wrote:
> 
> Hopefully, fixed with:
> https://reviews.llvm.org/rL329429 <https://reviews.llvm.org/rL329429>
> 
> Let me know if you see other problems - thanks!
> 
> On Fri, Apr 6, 2018 at 11:08 AM, Sanjay Patel <spatel at rotateright.com <mailto:spatel at rotateright.com>> wrote:
> 
> 
> On Fri, Apr 6, 2018 at 10:56 AM, <escha at apple.com <mailto:escha at apple.com>> wrote:
> 
> 
>> On Apr 6, 2018, at 9:48 AM, Sanjay Patel <spatel at rotateright.com <mailto:spatel at rotateright.com>> wrote:
>> 
>> Ah, thanks for letting me know. I did consider that case, but I was in an x86 mindset (and IIRC, your target has free or close-to-free fneg?), so the asm diff looks something like this:
>> 
>> define float @neg_sub_nsz_extra_use(float %x, float %y) {
>>   %t1 = fsub float %x, %y
>>   %t2 = fsub nsz float -0.0, %t1
>>   %t3 = fdiv float %t1, %t2 ; just for illustration of extra use of the 1st fsub...
>>   ret float %t3
>> }
>> 
>> With fneg:
>>     vsubss    %xmm1, %xmm0, %xmm0
>>     vxorps    LCPI0_0(%rip), %xmm0, %xmm1 ; load a constant to fneg...sigh
>>     vdivss    %xmm1, %xmm0, %xmm0
>> 
>> Replace with fsub:
>>     vsubss    %xmm1, %xmm0, %xmm2
>>     vsubss    %xmm0, %xmm1, %xmm0
>>     vdivss    %xmm0, %xmm2, %xmm0
> 
> Yeah, we have free fneg (as do all GPUs that I know of, including AMDGPU).
> 
> Even on x86, I’m nervous about avoiding a xorps by that method in general; i’d expect xorps to be faster because it uses a different execution pipe, at least in some cases.
> 
> This transform, if done at all, should probably be done in target-specific DAG code, since it’s only profitable on a fairly specific subset of machines.
> 
> I also haven’t looked, but can you double check that your Z - (X - Y) --> Z + (Y - X) transform also checks oneuse?
> 
> 
> Yes - that one has m_OneUse:
> https://reviews.llvm.org/rL329362 <https://reviews.llvm.org/rL329362>
> 
> (also note that we already do that transform in instcombine, but requiring the full set of FMF)
> 
> I can merge these to reduce the code.
>  
> 
>  
>> 
>> So let me add the use check (I added the test already at https://reviews.llvm.org/rL329418 <https://reviews.llvm.org/rL329418> ) to avoid the problem immediately. But maybe I should look at later target-specific (DAG) transforms that can change one to the other based on prefs?
> 
> I’m not sure. The category of CPUs where “fsub is faster than xor” seems like a fairly Weird, Special Category. maybe just put it in the x86 backend alone?
> 
> Yeah, it's likely a uarch- and code-specific corner-case.
> 
>  
> 
> —escha
> 
>> 
>> On Fri, Apr 6, 2018 at 9:33 AM, <escha at apple.com <mailto:escha at apple.com>> wrote:
>> This patch is giving us fairly significant instruction count regressions on a number of shaders. The regressions go away if we gate it with hasOneUse, with makes sense; the patch as committed is duplicating arithmetic!  Is it possible to add hasOneUse checks to this? I don’t feel like it’s reasonable to duplicate fsubs just because they have a negated use, especially since the compiler as a whole is very bad at deduplicating them later.
>> 
>> —escha
>> 
>> > On Apr 5, 2018, at 2:37 PM, Sanjay Patel via llvm-commits <llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>> wrote:
>> >
>> > Author: spatel
>> > Date: Thu Apr  5 14:37:17 2018
>> > New Revision: 329350
>> >
>> > URL: http://llvm.org/viewvc/llvm-project?rev=329350&view=rev <http://llvm.org/viewvc/llvm-project?rev=329350&view=rev>
>> > Log:
>> > [InstCombine] nsz: -(X - Y) --> Y - X
>> >
>> > This restores part of the fold that was removed with rL73243 (PR4374).
>> >
>> > Modified:
>> >    llvm/trunk/lib/Transforms/InstCombine/InstCombineAddSub.cpp
>> >    llvm/trunk/test/Transforms/InstCombine/fsub.ll
>> >
>> > Modified: llvm/trunk/lib/Transforms/InstCombine/InstCombineAddSub.cpp
>> > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/InstCombine/InstCombineAddSub.cpp?rev=329350&r1=329349&r2=329350&view=diff <http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/InstCombine/InstCombineAddSub.cpp?rev=329350&r1=329349&r2=329350&view=diff>
>> > ==============================================================================
>> > --- llvm/trunk/lib/Transforms/InstCombine/InstCombineAddSub.cpp (original)
>> > +++ llvm/trunk/lib/Transforms/InstCombine/InstCombineAddSub.cpp Thu Apr  5 14:37:17 2018
>> > @@ -1698,10 +1698,17 @@ Instruction *InstCombiner::visitFSub(Bin
>> >                                   SQ.getWithInstruction(&I)))
>> >     return replaceInstUsesWith(I, V);
>> >
>> > -  // Subtraction from -0.0 is the canonical form of fneg.
>> > -  // fsub nsz 0, X ==> fsub nsz -0.0, X
>> > -  if (I.getFastMathFlags().noSignedZeros() && match(Op0, m_PosZeroFP()))
>> > -    return BinaryOperator::CreateFNegFMF(Op1, &I);
>> > +  if (I.hasNoSignedZeros()) {
>> > +    // Subtraction from -0.0 is the canonical form of fneg.
>> > +    // fsub nsz 0, X ==> fsub nsz -0.0, X
>> > +    if (match(Op0, m_PosZeroFP()))
>> > +      return BinaryOperator::CreateFNegFMF(Op1, &I);
>> > +
>> > +    // With no-signed-zeros: -(X - Y) --> Y - X
>> > +    Value *X, *Y;
>> > +    if (match(Op0, m_NegZeroFP()) && match(Op1, m_FSub(m_Value(X), m_Value(Y))))
>> > +      return BinaryOperator::CreateFSubFMF(Y, X, &I);
>> > +  }
>> >
>> >   if (isa<Constant>(Op0))
>> >     if (SelectInst *SI = dyn_cast<SelectInst>(Op1))
>> >
>> > Modified: llvm/trunk/test/Transforms/InstCombine/fsub.ll
>> > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/InstCombine/fsub.ll?rev=329350&r1=329349&r2=329350&view=diff <http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/InstCombine/fsub.ll?rev=329350&r1=329349&r2=329350&view=diff>
>> > ==============================================================================
>> > --- llvm/trunk/test/Transforms/InstCombine/fsub.ll (original)
>> > +++ llvm/trunk/test/Transforms/InstCombine/fsub.ll Thu Apr  5 14:37:17 2018
>> > @@ -19,8 +19,7 @@ define float @test1(float %x, float %y)
>> >
>> > define float @neg_sub(float %x, float %y) {
>> > ; CHECK-LABEL: @neg_sub(
>> > -; CHECK-NEXT:    [[T1:%.*]] = fsub float [[X:%.*]], [[Y:%.*]]
>> > -; CHECK-NEXT:    [[T2:%.*]] = fsub nsz float -0.000000e+00, [[T1]]
>> > +; CHECK-NEXT:    [[T2:%.*]] = fsub nsz float [[Y:%.*]], [[X:%.*]]
>> > ; CHECK-NEXT:    ret float [[T2]]
>> > ;
>> >   %t1 = fsub float %x, %y
>> >
>> >
>> > _______________________________________________
>> > llvm-commits mailing list
>> > llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
>> 
>> 
> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180407/ddc41e4e/attachment.html>