[llvm] r236031 - transform fadd chains to increase parallelism

Wed Apr 29 20:07:57 PDT 2015

Woah!  This change is very not OK for in-order targets!  Additionally, we’re observing massive regressions, including catastrophic spilling and catastrophic compile time, resulting from it.  We’re still reducing/sanitizing a testcase we can share, but *at least* this needs to move under some kind of target hook so that in-order targets can opt-out.

—Owen

> On Apr 28, 2015, at 2:03 PM, Sanjay Patel <spatel at rotateright.com> wrote:
> 
> Author: spatel
> Date: Tue Apr 28 16:03:22 2015
> New Revision: 236031
> 
> URL: http://llvm.org/viewvc/llvm-project?rev=236031&view=rev
> Log:
> transform fadd chains to increase parallelism
> 
> This is a compromise: with this simple patch, we should always handle a chain of exactly 3
> operations optimally, but we're not generating the optimal balanced binary tree for a longer
> sequence.
> 
> In general, this transform will reduce the dependency chain for a sequence of instructions
> using N operands from a worst case N-1 dependent operations to N/2 dependent operations. 
> The optimal balanced binary tree would reduce the chain to log2(N).
> 
> The trade-off for not dealing with longer sequences is: (1) we have less complexity in the
> compiler, (2) we avoid unknown compile-time blowup calculating a balanced tree, and (3) we
> don't need to worry about the increased register pressure required to parallelize longer
> sequences. It also seems unlikely that we would ever encounter really long strings of
> dependent ops like that in the wild, but I'm not sure how to verify that speculation.
> FWIW, I see no perf difference for test-suite running on btver2 (x86-64) with -ffast-math
> and this patch.
> 
> We can extend this patch to cover other associative operations such as fmul, fmax, fmin, 
> integer add, integer mul.
> 
> This is a partial fix for:
> https://llvm.org/bugs/show_bug.cgi?id=17305
> 
> and if extended:
> https://llvm.org/bugs/show_bug.cgi?id=21768
> https://llvm.org/bugs/show_bug.cgi?id=23116
> 
> The issue also came up in:
> http://reviews.llvm.org/D8941
> 
> Differential Revision: http://reviews.llvm.org/D9232
> 
> 
> Modified:
>    llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>    llvm/trunk/test/CodeGen/X86/fp-fast.ll
> 
> Modified: llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=236031&r1=236030&r2=236031&view=diff
> ==============================================================================
> --- llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (original)
> +++ llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Tue Apr 28 16:03:22 2015
> @@ -7801,6 +7801,24 @@ SDValue DAGCombiner::visitFADD(SDNode *N
>                            N0.getOperand(0), DAG.getConstantFP(4.0, DL, VT));
>       }
>     }
> +
> +    // Canonicalize chains of adds to LHS to simplify the following transform.
> +    if (N0.getOpcode() != ISD::FADD && N1.getOpcode() == ISD::FADD)
> +      return DAG.getNode(ISD::FADD, SDLoc(N), VT, N1, N0);
> +    
> +    // Convert a chain of 3 dependent operations into 2 independent operations
> +    // and 1 dependent operation:
> +    //  (fadd N0: (fadd N00: (fadd z, w), N01: y), N1: x) ->
> +    //  (fadd N00: (fadd z, w), (fadd N1: x, N01: y))
> +    if (N0.getOpcode() == ISD::FADD &&  N0.hasOneUse() &&
> +        N1.getOpcode() != ISD::FADD) {
> +      SDValue N00 = N0.getOperand(0);
> +      if (N00.getOpcode() == ISD::FADD) {
> +        SDValue N01 = N0.getOperand(1);
> +        SDValue NewAdd = DAG.getNode(ISD::FADD, SDLoc(N), VT, N1, N01);
> +        return DAG.getNode(ISD::FADD, SDLoc(N), VT, N00, NewAdd);
> +      }
> +    }
>   } // enable-unsafe-fp-math
> 
>   // FADD -> FMA combines:
> 
> Modified: llvm/trunk/test/CodeGen/X86/fp-fast.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/fp-fast.ll?rev=236031&r1=236030&r2=236031&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/fp-fast.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/fp-fast.ll Tue Apr 28 16:03:22 2015
> @@ -113,3 +113,46 @@ define float @test11(float %a) {
>   %t2 = fadd float %a, %t1
>   ret float %t2
> }
> +
> +; Verify that the first two adds are independent; the destination registers
> +; are used as source registers for the third add.
> +
> +define float @reassociate_adds1(float %a, float %b, float %c, float %d) {
> +; CHECK-LABEL: reassociate_adds1:
> +; CHECK:       # BB#0:
> +; CHECK-NEXT:    vaddss %xmm1, %xmm0, %xmm0
> +; CHECK-NEXT:    vaddss %xmm2, %xmm3, %xmm1
> +; CHECK-NEXT:    vaddss %xmm1, %xmm0, %xmm0
> +; CHECK-NEXT:    retq
> +  %add0 = fadd float %a, %b
> +  %add1 = fadd float %add0, %c
> +  %add2 = fadd float %add1, %d
> +  ret float %add2
> +}
> +
> +define float @reassociate_adds2(float %a, float %b, float %c, float %d) {
> +; CHECK-LABEL: reassociate_adds2:
> +; CHECK:       # BB#0:
> +; CHECK-NEXT:    vaddss %xmm1, %xmm0, %xmm0
> +; CHECK-NEXT:    vaddss %xmm2, %xmm3, %xmm1
> +; CHECK-NEXT:    vaddss %xmm1, %xmm0, %xmm0
> +; CHECK-NEXT:    retq
> +  %add0 = fadd float %a, %b
> +  %add1 = fadd float %c, %add0
> +  %add2 = fadd float %add1, %d
> +  ret float %add2
> +}
> +
> +define float @reassociate_adds3(float %a, float %b, float %c, float %d) {
> +; CHECK-LABEL: reassociate_adds3:
> +; CHECK:       # BB#0:
> +; CHECK-NEXT:    vaddss %xmm1, %xmm0, %xmm0
> +; CHECK-NEXT:    vaddss %xmm2, %xmm3, %xmm1
> +; CHECK-NEXT:    vaddss %xmm1, %xmm0, %xmm0
> +; CHECK-NEXT:    retq
> +  %add0 = fadd float %a, %b
> +  %add1 = fadd float %add0, %c
> +  %add2 = fadd float %d, %add1
> +  ret float %add2
> +}
> +
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits