[llvm] r236031 - transform fadd chains to increase parallelism

Wed Apr 29 19:25:34 PDT 2015

Hello Sanjay,

In this commit you introduce a new canonicalization in the DAG:

 (fadd (otherop, fadd)) -> (fadd (fadd, otherop))

It is not clear to me why this canonicalization is desirable in general (other than to ease the implementation of your transformation, which does not seem to justify it alone IMO).

Also, if such a canonicalization is needed, my feeling is that it shouldn’t be hidden deeply here, and should probably apply to a bunch of other binop.

As a side question: I’m curious why isn’t this been performed in instcombine?

Thanks,

Mehdi

> On Apr 28, 2015, at 2:03 PM, Sanjay Patel <spatel at rotateright.com> wrote:
> 
> Author: spatel
> Date: Tue Apr 28 16:03:22 2015
> New Revision: 236031
> 
> URL: http://llvm.org/viewvc/llvm-project?rev=236031&view=rev
> Log:
> transform fadd chains to increase parallelism
> 
> This is a compromise: with this simple patch, we should always handle a chain of exactly 3
> operations optimally, but we're not generating the optimal balanced binary tree for a longer
> sequence.
> 
> In general, this transform will reduce the dependency chain for a sequence of instructions
> using N operands from a worst case N-1 dependent operations to N/2 dependent operations. 
> The optimal balanced binary tree would reduce the chain to log2(N).
> 
> The trade-off for not dealing with longer sequences is: (1) we have less complexity in the
> compiler, (2) we avoid unknown compile-time blowup calculating a balanced tree, and (3) we
> don't need to worry about the increased register pressure required to parallelize longer
> sequences. It also seems unlikely that we would ever encounter really long strings of
> dependent ops like that in the wild, but I'm not sure how to verify that speculation.
> FWIW, I see no perf difference for test-suite running on btver2 (x86-64) with -ffast-math
> and this patch.
> 
> We can extend this patch to cover other associative operations such as fmul, fmax, fmin, 
> integer add, integer mul.
> 
> This is a partial fix for:
> https://llvm.org/bugs/show_bug.cgi?id=17305
> 
> and if extended:
> https://llvm.org/bugs/show_bug.cgi?id=21768
> https://llvm.org/bugs/show_bug.cgi?id=23116
> 
> The issue also came up in:
> http://reviews.llvm.org/D8941
> 
> Differential Revision: http://reviews.llvm.org/D9232
> 
> 
> Modified:
>    llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>    llvm/trunk/test/CodeGen/X86/fp-fast.ll
> 
> Modified: llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=236031&r1=236030&r2=236031&view=diff
> ==============================================================================
> --- llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (original)
> +++ llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Tue Apr 28 16:03:22 2015
> @@ -7801,6 +7801,24 @@ SDValue DAGCombiner::visitFADD(SDNode *N
>                            N0.getOperand(0), DAG.getConstantFP(4.0, DL, VT));
>       }
>     }
> +
> +    // Canonicalize chains of adds to LHS to simplify the following transform.
> +    if (N0.getOpcode() != ISD::FADD && N1.getOpcode() == ISD::FADD)
> +      return DAG.getNode(ISD::FADD, SDLoc(N), VT, N1, N0);
> +    
> +    // Convert a chain of 3 dependent operations into 2 independent operations
> +    // and 1 dependent operation:
> +    //  (fadd N0: (fadd N00: (fadd z, w), N01: y), N1: x) ->
> +    //  (fadd N00: (fadd z, w), (fadd N1: x, N01: y))
> +    if (N0.getOpcode() == ISD::FADD &&  N0.hasOneUse() &&
> +        N1.getOpcode() != ISD::FADD) {
> +      SDValue N00 = N0.getOperand(0);
> +      if (N00.getOpcode() == ISD::FADD) {
> +        SDValue N01 = N0.getOperand(1);
> +        SDValue NewAdd = DAG.getNode(ISD::FADD, SDLoc(N), VT, N1, N01);
> +        return DAG.getNode(ISD::FADD, SDLoc(N), VT, N00, NewAdd);
> +      }
> +    }
>   } // enable-unsafe-fp-math
> 
>   // FADD -> FMA combines:
> 
> Modified: llvm/trunk/test/CodeGen/X86/fp-fast.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/fp-fast.ll?rev=236031&r1=236030&r2=236031&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/fp-fast.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/fp-fast.ll Tue Apr 28 16:03:22 2015
> @@ -113,3 +113,46 @@ define float @test11(float %a) {
>   %t2 = fadd float %a, %t1
>   ret float %t2
> }
> +
> +; Verify that the first two adds are independent; the destination registers
> +; are used as source registers for the third add.
> +
> +define float @reassociate_adds1(float %a, float %b, float %c, float %d) {
> +; CHECK-LABEL: reassociate_adds1:
> +; CHECK:       # BB#0:
> +; CHECK-NEXT:    vaddss %xmm1, %xmm0, %xmm0
> +; CHECK-NEXT:    vaddss %xmm2, %xmm3, %xmm1
> +; CHECK-NEXT:    vaddss %xmm1, %xmm0, %xmm0
> +; CHECK-NEXT:    retq
> +  %add0 = fadd float %a, %b
> +  %add1 = fadd float %add0, %c
> +  %add2 = fadd float %add1, %d
> +  ret float %add2
> +}
> +
> +define float @reassociate_adds2(float %a, float %b, float %c, float %d) {
> +; CHECK-LABEL: reassociate_adds2:
> +; CHECK:       # BB#0:
> +; CHECK-NEXT:    vaddss %xmm1, %xmm0, %xmm0
> +; CHECK-NEXT:    vaddss %xmm2, %xmm3, %xmm1
> +; CHECK-NEXT:    vaddss %xmm1, %xmm0, %xmm0
> +; CHECK-NEXT:    retq
> +  %add0 = fadd float %a, %b
> +  %add1 = fadd float %c, %add0
> +  %add2 = fadd float %add1, %d
> +  ret float %add2
> +}
> +
> +define float @reassociate_adds3(float %a, float %b, float %c, float %d) {
> +; CHECK-LABEL: reassociate_adds3:
> +; CHECK:       # BB#0:
> +; CHECK-NEXT:    vaddss %xmm1, %xmm0, %xmm0
> +; CHECK-NEXT:    vaddss %xmm2, %xmm3, %xmm1
> +; CHECK-NEXT:    vaddss %xmm1, %xmm0, %xmm0
> +; CHECK-NEXT:    retq
> +  %add0 = fadd float %a, %b
> +  %add1 = fadd float %add0, %c
> +  %add2 = fadd float %d, %add1
> +  ret float %add2
> +}
> +
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits