[llvm] r211771 - [X86] Improve the selection of SSE3/AVX addsub instructions.

Tue Jul 22 23:18:06 PDT 2014

Sorry for arriving late to the party here, but I've been hacking on the DAG
combiner and it has uncovered what is (I think) a fundamental flaw in the
current approach here.

You are creating a target DAG combine that will actually reverse (and be
reversed by) the target-independent canonicalizing DAG combine. This
essentially creates the potential for a combine-cycle if the newly adjusted
nodes ever get reliably added to the combiner's worklist. Today, this
doesn't happen because we don't do a good job of rigorously getting newly
formed nodes onto the worklist. My work is changing that and exposing this
issue, but it was bound to happen eventually. Essentially, I don't think
you can do this with combining and forming a target-specific canonical form
just to trigger an instruction selector pattern.

I also don't think that's the right basic approach. Instead, this should be
implemented during the lowering of these patterns. I could imagine doing
this as part of custom lowering of addsub nodes, or as part of lowering
shuffles which consume addsub nodes to bypass actual shuffle instructions
by reversing the math operation. I like the latter best I think.

Can you switch to such an approach promptly? I'm testing my DAG combine
improvements locally without this commit, but want to get un-blocked...

Also (and much more long term), I wonder if there is a more fundamental
problem with how you're approaching addsub matching: it will *only* work
when the user "unshuffles" the result somehow. I think this is extremely
limiting. Really, we should *form* shuffles of add and sub operands such
that we can use the addsub instructions. Then, when the user has the
shuffles in their code directly (or when one of our vectorizers
specifically forms them) the shuffles will combine away as redundant. Does
that make sense?

On Thu, Jun 26, 2014 at 3:45 AM, Andrea Di Biagio <
Andrea_DiBiagio at sn.scee.net> wrote:

> Author: adibiagio
> Date: Thu Jun 26 05:45:21 2014
> New Revision: 211771
>
> URL: http://llvm.org/viewvc/llvm-project?rev=211771&view=rev
> Log:
> [X86] Improve the selection of SSE3/AVX addsub instructions.
>
> This patch teaches the backend how to canonicalize a shuffle vectors
> according to the rule:
>
>  - (shuffle (FADD A, B), (FSUB A, B), Mask) ->
>        (shuffle (FSUB A, -B), (FADD A, -B), Mask)
>
> Where 'Mask' is:
>   <0,5,2,7>            ;; for v4f32 and v4f64 shuffles.
>   <0,3>                ;; for v2f64 shuffles.
>   <0,9,2,11,4,13,6,15> ;; for v8f32 shuffles.
>
> In general, ISel only knows how to pattern-match a canonical
> 'fadd + fsub + blendi' dag node sequence into an ADDSUB instruction.
>
> This new rule allows to convert a non-canonical dag sequence into a
> canonical one that will be matched by a single ADDSUB at ISel stage.
>
> The idea of converting a non-canonical ADDSUB into a canonical one by
> swapping the first two operands of the shuffle, and then negating the
> second operand of the FADD and FSUB, was originally proposed by Hal Finkel.
>
>
> Modified:
>     llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>     llvm/trunk/test/CodeGen/X86/sse3-avx-addsub.ll
>
> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=211771&r1=211770&r2=211771&view=diff
>
> ==============================================================================
> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Thu Jun 26 05:45:21 2014
> @@ -18014,6 +18014,49 @@ static SDValue PerformShuffleCombine(SDN
>    SDValue N1 = N->getOperand(1);
>    EVT VT = N->getValueType(0);
>
> +  // Canonicalize shuffles that perform 'addsub' on packed float vectors
> +  // according to the rule:
> +  //  (shuffle (FADD A, B), (FSUB A, B), Mask) ->
> +  //  (shuffle (FSUB A, -B), (FADD A, -B), Mask)
> +  //
> +  // Where 'Mask' is:
> +  //  <0,5,2,7>             -- for v4f32 and v4f64 shuffles;
> +  //  <0,3>                 -- for v2f64 shuffles;
> +  //  <0,9,2,11,4,13,6,15>  -- for v8f32 shuffles.
> +  //
> +  // This helps pattern-matching more SSE3/AVX ADDSUB instructions
> +  // during ISel stage.
> +  if (N->getOpcode() == ISD::VECTOR_SHUFFLE &&
> +      ((Subtarget->hasSSE3() && (VT == MVT::v4f32 || VT == MVT::v2f64)) ||
> +       (Subtarget->hasAVX() && (VT == MVT::v8f32 || VT == MVT::v4f64))) &&
> +      N0->getOpcode() == ISD::FADD && N1->getOpcode() == ISD::FSUB &&
> +      // Operands to the FADD and FSUB must be the same.
> +      ((N0->getOperand(0) == N1->getOperand(0) &&
> +        N0->getOperand(1) == N1->getOperand(1)) ||
> +       // FADD is commutable. See if by commuting the operands of the FADD
> +       // we would still be able to match the operands of the FSUB dag
> node.
> +       (N0->getOperand(1) == N1->getOperand(0) &&
> +        N0->getOperand(0) == N1->getOperand(1))) &&
> +      N0->getOperand(0)->getOpcode() != ISD::UNDEF &&
> +      N0->getOperand(1)->getOpcode() != ISD::UNDEF) {
> +
> +    ShuffleVectorSDNode *SV = cast<ShuffleVectorSDNode>(N);
> +    unsigned NumElts = VT.getVectorNumElements();
> +    ArrayRef<int> Mask = SV->getMask();
> +    bool CanFold = true;
> +
> +    for (unsigned i = 0, e = NumElts; i != e && CanFold; ++i)
> +      CanFold = Mask[i] == (i & 1) ? i + NumElts : i;
> +
> +    if (CanFold) {
> +      SDValue Op0 = N1->getOperand(0);
> +      SDValue Op1 = DAG.getNode(ISD::FNEG, dl, VT, N1->getOperand(1));
> +      SDValue Sub = DAG.getNode(ISD::FSUB, dl, VT, Op0, Op1);
> +      SDValue Add = DAG.getNode(ISD::FADD, dl, VT, Op0, Op1);
> +      return DAG.getVectorShuffle(VT, dl, Sub, Add, Mask);
> +    }
> +  }
> +
>    // Don't create instructions with illegal types after legalize types
> has run.
>    const TargetLowering &TLI = DAG.getTargetLoweringInfo();
>    if (!DCI.isBeforeLegalize() &&
> !TLI.isTypeLegal(VT.getVectorElementType()))
>
> Modified: llvm/trunk/test/CodeGen/X86/sse3-avx-addsub.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/sse3-avx-addsub.ll?rev=211771&r1=211770&r2=211771&view=diff
>
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/sse3-avx-addsub.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/sse3-avx-addsub.ll Thu Jun 26 05:45:21 2014
> @@ -3,7 +3,7 @@
>
>  ; Test ADDSUB ISel patterns.
>
> -; All the functions below are obtained from the following source:
> +; Functions below are obtained from the following source:
>  ;
>  ; typedef double double2 __attribute__((ext_vector_type(2)));
>  ; typedef double double4 __attribute__((ext_vector_type(4)));
> @@ -19,7 +19,7 @@
>  ; float8 test2(float8 A, float8 B) {
>  ;   float8 X = A - B;
>  ;   float8 Y = A + B;
> -;   return (float8){X[0], Y[1], X[2], Y[3], X[4], Y[5], X[6], [7]};
> +;   return (float8){X[0], Y[1], X[2], Y[3], X[4], Y[5], X[6], Y[7]};
>  ; }
>  ;
>  ; double4 test3(double4 A, double4 B) {
> @@ -141,3 +141,156 @@ define <2 x double> @test4b(<2 x double>
>  ; AVX: vaddsubpd
>  ; CHECK-NEXT: ret
>
> +; Functions below are obtained from the following source:
> +;
> +; float4 test1(float4 A, float4 B) {
> +;   float4 X = A + B;
> +;   float4 Y = A - B;
> +;   return (float4){X[0], Y[1], X[2], Y[3]};
> +; }
> +;
> +; float8 test2(float8 A, float8 B) {
> +;   float8 X = A + B;
> +;   float8 Y = A - B;
> +;   return (float8){X[0], Y[1], X[2], Y[3], X[4], Y[5], X[6], Y[7]};
> +; }
> +;
> +; double4 test3(double4 A, double4 B) {
> +;   double4 X = A + B;
> +;   double4 Y = A - B;
> +;   return (double4){X[0], Y[1], X[2], Y[3]};
> +; }
> +;
> +; double2 test4(double2 A, double2 B) {
> +;   double2 X = A + B;
> +;   double2 Y = A - B;
> +;   return (double2){X[0], Y[1]};
> +; }
> +
> +define <4 x float> @test5(<4 x float> %A, <4 x float> %B) {
> +  %sub = fsub <4 x float> %A, %B
> +  %add = fadd <4 x float> %A, %B
> +  %vecinit6 = shufflevector <4 x float> %add, <4 x float> %sub, <4 x i32>
> <i32 0, i32 5, i32 2, i32 7>
> +  ret <4 x float> %vecinit6
> +}
> +; CHECK-LABEL: test5
> +; SSE: xorps
> +; SSE-NEXT: addsubps
> +; AVX: vxorps
> +; AVX-NEXT: vaddsubps
> +; CHECK: ret
> +
> +
> +define <8 x float> @test6(<8 x float> %A, <8 x float> %B) {
> +  %sub = fsub <8 x float> %A, %B
> +  %add = fadd <8 x float> %A, %B
> +  %vecinit14 = shufflevector <8 x float> %add, <8 x float> %sub, <8 x
> i32> <i32 0, i32 9, i32 2, i32 11, i32 4, i32 13, i32 6, i32 15>
> +  ret <8 x float> %vecinit14
> +}
> +; CHECK-LABEL: test6
> +; SSE: xorps
> +; SSE-NEXT: addsubps
> +; SSE: xorps
> +; SSE-NEXT: addsubps
> +; AVX: vxorps
> +; AVX-NEXT: vaddsubps
> +; AVX-NOT: vxorps
> +; AVX-NOT: vaddsubps
> +; CHECK: ret
> +
> +
> +define <4 x double> @test7(<4 x double> %A, <4 x double> %B) {
> +  %sub = fsub <4 x double> %A, %B
> +  %add = fadd <4 x double> %A, %B
> +  %vecinit6 = shufflevector <4 x double> %add, <4 x double> %sub, <4 x
> i32> <i32 0, i32 5, i32 2, i32 7>
> +  ret <4 x double> %vecinit6
> +}
> +; CHECK-LABEL: test7
> +; SSE: xorpd
> +; SSE-NEXT: addsubpd
> +; SSE: xorpd
> +; SSE-NEXT: addsubpd
> +; AVX: vxorpd
> +; AVX-NEXT: vaddsubpd
> +; AVX-NOT: vxorpd
> +; AVX-NOT: vaddsubpd
> +; CHECK: ret
> +
> +
> +define <2 x double> @test8(<2 x double> %A, <2 x double> %B) #0 {
> +  %add = fadd <2 x double> %A, %B
> +  %sub = fsub <2 x double> %A, %B
> +  %vecinit2 = shufflevector <2 x double> %add, <2 x double> %sub, <2 x
> i32> <i32 0, i32 3>
> +  ret <2 x double> %vecinit2
> +}
> +; CHECK-LABEL: test8
> +; SSE: xorpd
> +; SSE-NEXT: addsubpd
> +; AVX: vxorpd
> +; AVX-NEXT: vaddsubpd
> +; CHECK: ret
> +
> +
> +define <4 x float> @test5b(<4 x float> %A, <4 x float> %B) {
> +  %sub = fsub <4 x float> %A, %B
> +  %add = fadd <4 x float> %B, %A
> +  %vecinit6 = shufflevector <4 x float> %add, <4 x float> %sub, <4 x i32>
> <i32 0, i32 5, i32 2, i32 7>
> +  ret <4 x float> %vecinit6
> +}
> +; CHECK-LABEL: test5
> +; SSE: xorps
> +; SSE-NEXT: addsubps
> +; AVX: vxorps
> +; AVX-NEXT: vaddsubps
> +; CHECK: ret
> +
> +
> +define <8 x float> @test6b(<8 x float> %A, <8 x float> %B) {
> +  %sub = fsub <8 x float> %A, %B
> +  %add = fadd <8 x float> %B, %A
> +  %vecinit14 = shufflevector <8 x float> %add, <8 x float> %sub, <8 x
> i32> <i32 0, i32 9, i32 2, i32 11, i32 4, i32 13, i32 6, i32 15>
> +  ret <8 x float> %vecinit14
> +}
> +; CHECK-LABEL: test6
> +; SSE: xorps
> +; SSE-NEXT: addsubps
> +; SSE: xorps
> +; SSE-NEXT: addsubps
> +; AVX: vxorps
> +; AVX-NEXT: vaddsubps
> +; AVX-NOT: vxorps
> +; AVX-NOT: vaddsubps
> +; CHECK: ret
> +
> +
> +define <4 x double> @test7b(<4 x double> %A, <4 x double> %B) {
> +  %sub = fsub <4 x double> %A, %B
> +  %add = fadd <4 x double> %B, %A
> +  %vecinit6 = shufflevector <4 x double> %add, <4 x double> %sub, <4 x
> i32> <i32 0, i32 5, i32 2, i32 7>
> +  ret <4 x double> %vecinit6
> +}
> +; CHECK-LABEL: test7
> +; SSE: xorpd
> +; SSE-NEXT: addsubpd
> +; SSE: xorpd
> +; SSE-NEXT: addsubpd
> +; AVX: vxorpd
> +; AVX-NEXT: vaddsubpd
> +; AVX-NOT: vxorpd
> +; AVX-NOT: vaddsubpd
> +; CHECK: ret
> +
> +
> +define <2 x double> @test8b(<2 x double> %A, <2 x double> %B) #0 {
> +  %add = fadd <2 x double> %B, %A
> +  %sub = fsub <2 x double> %A, %B
> +  %vecinit2 = shufflevector <2 x double> %add, <2 x double> %sub, <2 x
> i32> <i32 0, i32 3>
> +  ret <2 x double> %vecinit2
> +}
> +; CHECK-LABEL: test8
> +; SSE: xorpd
> +; SSE-NEXT: addsubpd
> +; AVX: vxorpd
> +; AVX-NEXT: vaddsubpd
> +; CHECK: ret
> +
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140722/1642d30a/attachment.html>