[llvm] r236031 - transform fadd chains to increase parallelism

Wed Apr 29 20:32:19 PDT 2015

Oh wow, sorry. I knew there was a register pressure risk, but I guessed
wrongly that it would be contained by the limited length of the transform.

If it's easier, please feel free to revert. I won't be at my dev machine
until tomorrow AM.

I'm not seeing how in-order vs. OOO is a factor?

On Wednesday, April 29, 2015, Owen Anderson <resistor at mac.com> wrote:

> Woah!  This change is very not OK for in-order targets!  Additionally,
> we’re observing massive regressions, including catastrophic spilling and
> catastrophic compile time, resulting from it.  We’re still
> reducing/sanitizing a testcase we can share, but *at least* this needs to
> move under some kind of target hook so that in-order targets can opt-out.
>
> —Owen
>
> > On Apr 28, 2015, at 2:03 PM, Sanjay Patel <spatel at rotateright.com
> <javascript:;>> wrote:
> >
> > Author: spatel
> > Date: Tue Apr 28 16:03:22 2015
> > New Revision: 236031
> >
> > URL: http://llvm.org/viewvc/llvm-project?rev=236031&view=rev
> > Log:
> > transform fadd chains to increase parallelism
> >
> > This is a compromise: with this simple patch, we should always handle a
> chain of exactly 3
> > operations optimally, but we're not generating the optimal balanced
> binary tree for a longer
> > sequence.
> >
> > In general, this transform will reduce the dependency chain for a
> sequence of instructions
> > using N operands from a worst case N-1 dependent operations to N/2
> dependent operations.
> > The optimal balanced binary tree would reduce the chain to log2(N).
> >
> > The trade-off for not dealing with longer sequences is: (1) we have less
> complexity in the
> > compiler, (2) we avoid unknown compile-time blowup calculating a
> balanced tree, and (3) we
> > don't need to worry about the increased register pressure required to
> parallelize longer
> > sequences. It also seems unlikely that we would ever encounter really
> long strings of
> > dependent ops like that in the wild, but I'm not sure how to verify that
> speculation.
> > FWIW, I see no perf difference for test-suite running on btver2 (x86-64)
> with -ffast-math
> > and this patch.
> >
> > We can extend this patch to cover other associative operations such as
> fmul, fmax, fmin,
> > integer add, integer mul.
> >
> > This is a partial fix for:
> > https://llvm.org/bugs/show_bug.cgi?id=17305
> >
> > and if extended:
> > https://llvm.org/bugs/show_bug.cgi?id=21768
> > https://llvm.org/bugs/show_bug.cgi?id=23116
> >
> > The issue also came up in:
> > http://reviews.llvm.org/D8941
> >
> > Differential Revision: http://reviews.llvm.org/D9232
> >
> >
> > Modified:
> >    llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> >    llvm/trunk/test/CodeGen/X86/fp-fast.ll
> >
> > Modified: llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> > URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=236031&r1=236030&r2=236031&view=diff
> >
> ==============================================================================
> > --- llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (original)
> > +++ llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Tue Apr 28
> 16:03:22 2015
> > @@ -7801,6 +7801,24 @@ SDValue DAGCombiner::visitFADD(SDNode *N
> >                            N0.getOperand(0), DAG.getConstantFP(4.0, DL,
> VT));
> >       }
> >     }
> > +
> > +    // Canonicalize chains of adds to LHS to simplify the following
> transform.
> > +    if (N0.getOpcode() != ISD::FADD && N1.getOpcode() == ISD::FADD)
> > +      return DAG.getNode(ISD::FADD, SDLoc(N), VT, N1, N0);
> > +
> > +    // Convert a chain of 3 dependent operations into 2 independent
> operations
> > +    // and 1 dependent operation:
> > +    //  (fadd N0: (fadd N00: (fadd z, w), N01: y), N1: x) ->
> > +    //  (fadd N00: (fadd z, w), (fadd N1: x, N01: y))
> > +    if (N0.getOpcode() == ISD::FADD &&  N0.hasOneUse() &&
> > +        N1.getOpcode() != ISD::FADD) {
> > +      SDValue N00 = N0.getOperand(0);
> > +      if (N00.getOpcode() == ISD::FADD) {
> > +        SDValue N01 = N0.getOperand(1);
> > +        SDValue NewAdd = DAG.getNode(ISD::FADD, SDLoc(N), VT, N1, N01);
> > +        return DAG.getNode(ISD::FADD, SDLoc(N), VT, N00, NewAdd);
> > +      }
> > +    }
> >   } // enable-unsafe-fp-math
> >
> >   // FADD -> FMA combines:
> >
> > Modified: llvm/trunk/test/CodeGen/X86/fp-fast.ll
> > URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/fp-fast.ll?rev=236031&r1=236030&r2=236031&view=diff
> >
> ==============================================================================
> > --- llvm/trunk/test/CodeGen/X86/fp-fast.ll (original)
> > +++ llvm/trunk/test/CodeGen/X86/fp-fast.ll Tue Apr 28 16:03:22 2015
> > @@ -113,3 +113,46 @@ define float @test11(float %a) {
> >   %t2 = fadd float %a, %t1
> >   ret float %t2
> > }
> > +
> > +; Verify that the first two adds are independent; the destination
> registers
> > +; are used as source registers for the third add.
> > +
> > +define float @reassociate_adds1(float %a, float %b, float %c, float %d)
> {
> > +; CHECK-LABEL: reassociate_adds1:
> > +; CHECK:       # BB#0:
> > +; CHECK-NEXT:    vaddss %xmm1, %xmm0, %xmm0
> > +; CHECK-NEXT:    vaddss %xmm2, %xmm3, %xmm1
> > +; CHECK-NEXT:    vaddss %xmm1, %xmm0, %xmm0
> > +; CHECK-NEXT:    retq
> > +  %add0 = fadd float %a, %b
> > +  %add1 = fadd float %add0, %c
> > +  %add2 = fadd float %add1, %d
> > +  ret float %add2
> > +}
> > +
> > +define float @reassociate_adds2(float %a, float %b, float %c, float %d)
> {
> > +; CHECK-LABEL: reassociate_adds2:
> > +; CHECK:       # BB#0:
> > +; CHECK-NEXT:    vaddss %xmm1, %xmm0, %xmm0
> > +; CHECK-NEXT:    vaddss %xmm2, %xmm3, %xmm1
> > +; CHECK-NEXT:    vaddss %xmm1, %xmm0, %xmm0
> > +; CHECK-NEXT:    retq
> > +  %add0 = fadd float %a, %b
> > +  %add1 = fadd float %c, %add0
> > +  %add2 = fadd float %add1, %d
> > +  ret float %add2
> > +}
> > +
> > +define float @reassociate_adds3(float %a, float %b, float %c, float %d)
> {
> > +; CHECK-LABEL: reassociate_adds3:
> > +; CHECK:       # BB#0:
> > +; CHECK-NEXT:    vaddss %xmm1, %xmm0, %xmm0
> > +; CHECK-NEXT:    vaddss %xmm2, %xmm3, %xmm1
> > +; CHECK-NEXT:    vaddss %xmm1, %xmm0, %xmm0
> > +; CHECK-NEXT:    retq
> > +  %add0 = fadd float %a, %b
> > +  %add1 = fadd float %add0, %c
> > +  %add2 = fadd float %d, %add1
> > +  ret float %add2
> > +}
> > +
> >
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu <javascript:;>
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150429/3dd745f0/attachment.html>