<div dir="ltr">Sorry for arriving late to the party here, but I've been hacking on the DAG combiner and it has uncovered what is (I think) a fundamental flaw in the current approach here.<div><br></div><div>You are creating a target DAG combine that will actually reverse (and be reversed by) the target-independent canonicalizing DAG combine. This essentially creates the potential for a combine-cycle if the newly adjusted nodes ever get reliably added to the combiner's worklist. Today, this doesn't happen because we don't do a good job of rigorously getting newly formed nodes onto the worklist. My work is changing that and exposing this issue, but it was bound to happen eventually. Essentially, I don't think you can do this with combining and forming a target-specific canonical form just to trigger an instruction selector pattern.</div>

<div><br></div><div>I also don't think that's the right basic approach. Instead, this should be implemented during the lowering of these patterns. I could imagine doing this as part of custom lowering of addsub nodes, or as part of lowering shuffles which consume addsub nodes to bypass actual shuffle instructions by reversing the math operation. I like the latter best I think.</div>

<div><br></div><div>Can you switch to such an approach promptly? I'm testing my DAG combine improvements locally without this commit, but want to get un-blocked...</div><div><br></div><div><br></div><div>Also (and much more long term), I wonder if there is a more fundamental problem with how you're approaching addsub matching: it will *only* work when the user "unshuffles" the result somehow. I think this is extremely limiting. Really, we should *form* shuffles of add and sub operands such that we can use the addsub instructions. Then, when the user has the shuffles in their code directly (or when one of our vectorizers specifically forms them) the shuffles will combine away as redundant. Does that make sense?</div>

</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Jun 26, 2014 at 3:45 AM, Andrea Di Biagio <span dir="ltr"><<a href="mailto:Andrea_DiBiagio@sn.scee.net" target="_blank">Andrea_DiBiagio@sn.scee.net</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Author: adibiagio<br>

Date: Thu Jun 26 05:45:21 2014<br>

New Revision: 211771<br>

<br>

URL: <a href="http://llvm.org/viewvc/llvm-project?rev=211771&view=rev" target="_blank">http://llvm.org/viewvc/llvm-project?rev=211771&view=rev</a><br>

Log:<br>

[X86] Improve the selection of SSE3/AVX addsub instructions.<br>

<br>

This patch teaches the backend how to canonicalize a shuffle vectors<br>

according to the rule:<br>

<br>

 - (shuffle (FADD A, B), (FSUB A, B), Mask) -><br>

       (shuffle (FSUB A, -B), (FADD A, -B), Mask)<br>

<br>

Where 'Mask' is:<br>

  <0,5,2,7>            ;; for v4f32 and v4f64 shuffles.<br>

  <0,3>                ;; for v2f64 shuffles.<br>

  <0,9,2,11,4,13,6,15> ;; for v8f32 shuffles.<br>

<br>

In general, ISel only knows how to pattern-match a canonical<br>

'fadd + fsub + blendi' dag node sequence into an ADDSUB instruction.<br>

<br>

This new rule allows to convert a non-canonical dag sequence into a<br>

canonical one that will be matched by a single ADDSUB at ISel stage.<br>

<br>

The idea of converting a non-canonical ADDSUB into a canonical one by<br>

swapping the first two operands of the shuffle, and then negating the<br>

second operand of the FADD and FSUB, was originally proposed by Hal Finkel.<br>

<br>

<br>

Modified:<br>

    llvm/trunk/lib/Target/X86/X86ISelLowering.cpp<br>

    llvm/trunk/test/CodeGen/X86/sse3-avx-addsub.ll<br>

<br>

Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=211771&r1=211770&r2=211771&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=211771&r1=211770&r2=211771&view=diff</a><br>


==============================================================================<br>

--- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)<br>

+++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Thu Jun 26 05:45:21 2014<br>

@@ -18014,6 +18014,49 @@ static SDValue PerformShuffleCombine(SDN<br>

   SDValue N1 = N->getOperand(1);<br>

   EVT VT = N->getValueType(0);<br>

<br>

+  // Canonicalize shuffles that perform 'addsub' on packed float vectors<br>

+  // according to the rule:<br>

+  //  (shuffle (FADD A, B), (FSUB A, B), Mask) -><br>

+  //  (shuffle (FSUB A, -B), (FADD A, -B), Mask)<br>

+  //<br>

+  // Where 'Mask' is:<br>

+  //  <0,5,2,7>             -- for v4f32 and v4f64 shuffles;<br>

+  //  <0,3>                 -- for v2f64 shuffles;<br>

+  //  <0,9,2,11,4,13,6,15>  -- for v8f32 shuffles.<br>

+  //<br>

+  // This helps pattern-matching more SSE3/AVX ADDSUB instructions<br>

+  // during ISel stage.<br>

+  if (N->getOpcode() == ISD::VECTOR_SHUFFLE &&<br>

+      ((Subtarget->hasSSE3() && (VT == MVT::v4f32 || VT == MVT::v2f64)) ||<br>

+       (Subtarget->hasAVX() && (VT == MVT::v8f32 || VT == MVT::v4f64))) &&<br>

+      N0->getOpcode() == ISD::FADD && N1->getOpcode() == ISD::FSUB &&<br>

+      // Operands to the FADD and FSUB must be the same.<br>

+      ((N0->getOperand(0) == N1->getOperand(0) &&<br>

+        N0->getOperand(1) == N1->getOperand(1)) ||<br>

+       // FADD is commutable. See if by commuting the operands of the FADD<br>

+       // we would still be able to match the operands of the FSUB dag node.<br>

+       (N0->getOperand(1) == N1->getOperand(0) &&<br>

+        N0->getOperand(0) == N1->getOperand(1))) &&<br>

+      N0->getOperand(0)->getOpcode() != ISD::UNDEF &&<br>

+      N0->getOperand(1)->getOpcode() != ISD::UNDEF) {<br>

+<br>

+    ShuffleVectorSDNode *SV = cast<ShuffleVectorSDNode>(N);<br>

+    unsigned NumElts = VT.getVectorNumElements();<br>

+    ArrayRef<int> Mask = SV->getMask();<br>

+    bool CanFold = true;<br>

+<br>

+    for (unsigned i = 0, e = NumElts; i != e && CanFold; ++i)<br>

+      CanFold = Mask[i] == (i & 1) ? i + NumElts : i;<br>

+<br>

+    if (CanFold) {<br>

+      SDValue Op0 = N1->getOperand(0);<br>

+      SDValue Op1 = DAG.getNode(ISD::FNEG, dl, VT, N1->getOperand(1));<br>

+      SDValue Sub = DAG.getNode(ISD::FSUB, dl, VT, Op0, Op1);<br>

+      SDValue Add = DAG.getNode(ISD::FADD, dl, VT, Op0, Op1);<br>

+      return DAG.getVectorShuffle(VT, dl, Sub, Add, Mask);<br>

+    }<br>

+  }<br>

+<br>

   // Don't create instructions with illegal types after legalize types has run.<br>

   const TargetLowering &TLI = DAG.getTargetLoweringInfo();<br>

   if (!DCI.isBeforeLegalize() && !TLI.isTypeLegal(VT.getVectorElementType()))<br>

<br>

Modified: llvm/trunk/test/CodeGen/X86/sse3-avx-addsub.ll<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/sse3-avx-addsub.ll?rev=211771&r1=211770&r2=211771&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/sse3-avx-addsub.ll?rev=211771&r1=211770&r2=211771&view=diff</a><br>


==============================================================================<br>

--- llvm/trunk/test/CodeGen/X86/sse3-avx-addsub.ll (original)<br>

+++ llvm/trunk/test/CodeGen/X86/sse3-avx-addsub.ll Thu Jun 26 05:45:21 2014<br>

@@ -3,7 +3,7 @@<br>

<br>

 ; Test ADDSUB ISel patterns.<br>

<br>

-; All the functions below are obtained from the following source:<br>

+; Functions below are obtained from the following source:<br>

 ;<br>

 ; typedef double double2 __attribute__((ext_vector_type(2)));<br>

 ; typedef double double4 __attribute__((ext_vector_type(4)));<br>

@@ -19,7 +19,7 @@<br>

 ; float8 test2(float8 A, float8 B) {<br>

 ;   float8 X = A - B;<br>

 ;   float8 Y = A + B;<br>

-;   return (float8){X[0], Y[1], X[2], Y[3], X[4], Y[5], X[6], [7]};<br>

+;   return (float8){X[0], Y[1], X[2], Y[3], X[4], Y[5], X[6], Y[7]};<br>

 ; }<br>

 ;<br>

 ; double4 test3(double4 A, double4 B) {<br>

@@ -141,3 +141,156 @@ define <2 x double> @test4b(<2 x double><br>

 ; AVX: vaddsubpd<br>

 ; CHECK-NEXT: ret<br>

<br>

+; Functions below are obtained from the following source:<br>

+;<br>

+; float4 test1(float4 A, float4 B) {<br>

+;   float4 X = A + B;<br>

+;   float4 Y = A - B;<br>

+;   return (float4){X[0], Y[1], X[2], Y[3]};<br>

+; }<br>

+;<br>

+; float8 test2(float8 A, float8 B) {<br>

+;   float8 X = A + B;<br>

+;   float8 Y = A - B;<br>

+;   return (float8){X[0], Y[1], X[2], Y[3], X[4], Y[5], X[6], Y[7]};<br>

+; }<br>

+;<br>

+; double4 test3(double4 A, double4 B) {<br>

+;   double4 X = A + B;<br>

+;   double4 Y = A - B;<br>

+;   return (double4){X[0], Y[1], X[2], Y[3]};<br>

+; }<br>

+;<br>

+; double2 test4(double2 A, double2 B) {<br>

+;   double2 X = A + B;<br>

+;   double2 Y = A - B;<br>

+;   return (double2){X[0], Y[1]};<br>

+; }<br>

+<br>

+define <4 x float> @test5(<4 x float> %A, <4 x float> %B) {<br>

+  %sub = fsub <4 x float> %A, %B<br>

+  %add = fadd <4 x float> %A, %B<br>

+  %vecinit6 = shufflevector <4 x float> %add, <4 x float> %sub, <4 x i32> <i32 0, i32 5, i32 2, i32 7><br>

+  ret <4 x float> %vecinit6<br>

+}<br>

+; CHECK-LABEL: test5<br>

+; SSE: xorps<br>

+; SSE-NEXT: addsubps<br>

+; AVX: vxorps<br>

+; AVX-NEXT: vaddsubps<br>

+; CHECK: ret<br>

+<br>

+<br>

+define <8 x float> @test6(<8 x float> %A, <8 x float> %B) {<br>

+  %sub = fsub <8 x float> %A, %B<br>

+  %add = fadd <8 x float> %A, %B<br>

+  %vecinit14 = shufflevector <8 x float> %add, <8 x float> %sub, <8 x i32> <i32 0, i32 9, i32 2, i32 11, i32 4, i32 13, i32 6, i32 15><br>

+  ret <8 x float> %vecinit14<br>

+}<br>

+; CHECK-LABEL: test6<br>

+; SSE: xorps<br>

+; SSE-NEXT: addsubps<br>

+; SSE: xorps<br>

+; SSE-NEXT: addsubps<br>

+; AVX: vxorps<br>

+; AVX-NEXT: vaddsubps<br>

+; AVX-NOT: vxorps<br>

+; AVX-NOT: vaddsubps<br>

+; CHECK: ret<br>

+<br>

+<br>

+define <4 x double> @test7(<4 x double> %A, <4 x double> %B) {<br>

+  %sub = fsub <4 x double> %A, %B<br>

+  %add = fadd <4 x double> %A, %B<br>

+  %vecinit6 = shufflevector <4 x double> %add, <4 x double> %sub, <4 x i32> <i32 0, i32 5, i32 2, i32 7><br>

+  ret <4 x double> %vecinit6<br>

+}<br>

+; CHECK-LABEL: test7<br>

+; SSE: xorpd<br>

+; SSE-NEXT: addsubpd<br>

+; SSE: xorpd<br>

+; SSE-NEXT: addsubpd<br>

+; AVX: vxorpd<br>

+; AVX-NEXT: vaddsubpd<br>

+; AVX-NOT: vxorpd<br>

+; AVX-NOT: vaddsubpd<br>

+; CHECK: ret<br>

+<br>

+<br>

+define <2 x double> @test8(<2 x double> %A, <2 x double> %B) #0 {<br>

+  %add = fadd <2 x double> %A, %B<br>

+  %sub = fsub <2 x double> %A, %B<br>

+  %vecinit2 = shufflevector <2 x double> %add, <2 x double> %sub, <2 x i32> <i32 0, i32 3><br>

+  ret <2 x double> %vecinit2<br>

+}<br>

+; CHECK-LABEL: test8<br>

+; SSE: xorpd<br>

+; SSE-NEXT: addsubpd<br>

+; AVX: vxorpd<br>

+; AVX-NEXT: vaddsubpd<br>

+; CHECK: ret<br>

+<br>

+<br>

+define <4 x float> @test5b(<4 x float> %A, <4 x float> %B) {<br>

+  %sub = fsub <4 x float> %A, %B<br>

+  %add = fadd <4 x float> %B, %A<br>

+  %vecinit6 = shufflevector <4 x float> %add, <4 x float> %sub, <4 x i32> <i32 0, i32 5, i32 2, i32 7><br>

+  ret <4 x float> %vecinit6<br>

+}<br>

+; CHECK-LABEL: test5<br>

+; SSE: xorps<br>

+; SSE-NEXT: addsubps<br>

+; AVX: vxorps<br>

+; AVX-NEXT: vaddsubps<br>

+; CHECK: ret<br>

+<br>

+<br>

+define <8 x float> @test6b(<8 x float> %A, <8 x float> %B) {<br>

+  %sub = fsub <8 x float> %A, %B<br>

+  %add = fadd <8 x float> %B, %A<br>

+  %vecinit14 = shufflevector <8 x float> %add, <8 x float> %sub, <8 x i32> <i32 0, i32 9, i32 2, i32 11, i32 4, i32 13, i32 6, i32 15><br>

+  ret <8 x float> %vecinit14<br>

+}<br>

+; CHECK-LABEL: test6<br>

+; SSE: xorps<br>

+; SSE-NEXT: addsubps<br>

+; SSE: xorps<br>

+; SSE-NEXT: addsubps<br>

+; AVX: vxorps<br>

+; AVX-NEXT: vaddsubps<br>

+; AVX-NOT: vxorps<br>

+; AVX-NOT: vaddsubps<br>

+; CHECK: ret<br>

+<br>

+<br>

+define <4 x double> @test7b(<4 x double> %A, <4 x double> %B) {<br>

+  %sub = fsub <4 x double> %A, %B<br>

+  %add = fadd <4 x double> %B, %A<br>

+  %vecinit6 = shufflevector <4 x double> %add, <4 x double> %sub, <4 x i32> <i32 0, i32 5, i32 2, i32 7><br>

+  ret <4 x double> %vecinit6<br>

+}<br>

+; CHECK-LABEL: test7<br>

+; SSE: xorpd<br>

+; SSE-NEXT: addsubpd<br>

+; SSE: xorpd<br>

+; SSE-NEXT: addsubpd<br>

+; AVX: vxorpd<br>

+; AVX-NEXT: vaddsubpd<br>

+; AVX-NOT: vxorpd<br>

+; AVX-NOT: vaddsubpd<br>

+; CHECK: ret<br>

+<br>

+<br>

+define <2 x double> @test8b(<2 x double> %A, <2 x double> %B) #0 {<br>

+  %add = fadd <2 x double> %B, %A<br>

+  %sub = fsub <2 x double> %A, %B<br>

+  %vecinit2 = shufflevector <2 x double> %add, <2 x double> %sub, <2 x i32> <i32 0, i32 3><br>

+  ret <2 x double> %vecinit2<br>

+}<br>

+; CHECK-LABEL: test8<br>

+; SSE: xorpd<br>

+; SSE-NEXT: addsubpd<br>

+; AVX: vxorpd<br>

+; AVX-NEXT: vaddsubpd<br>

+; CHECK: ret<br>

+<br>

<br>

<br>

_______________________________________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

</blockquote></div><br></div>