[PATCH] D52318: [x86] avoid 256-bit andnp that requires insert/extract with AVX1 (PR37449)

Thu Sep 20 11:48:07 PDT 2018

spatel created this revision.
spatel added reviewers: RKSimon, craig.topper, lebedev.ri.
Herald added a subscriber: mcrosier.

This is the final (I hope!) problem pattern mentioned in PR37749: 
https://bugs.llvm.org/show_bug.cgi?id=37749

We are trying to avoid an AVX1 sinkhole that arises because the bitwise logic ops are the only supported 256-bit integer ops. We've already solved the simple logic ops, but 'andn' is an x86 special. I looked at alternative solutions like extending the generic DAG combine or trying to wait until the ANDNP node is created, but those are bigger patches that can over-reach. Ie, splitting to 128-bit does not look like a win in most cases with >1 256-bit op.

The pattern matching is cluttered with bitcasts because of our i64 element canonicalization. For the affected test, we have this vector-type-legalized sequence:

          t29: v8i32 = concat_vectors t27, t28
        t30: v4i64 = bitcast t29
          t18: v8i32 = BUILD_VECTOR Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>
        t31: v4i64 = bitcast t18
      t32: v4i64 = xor t30, t31
        t9: v8i32 = BUILD_VECTOR Constant:i32<255>, Constant:i32<255>, Constant:i32<255>, Constant:i32<255>, Constant:i32<255>, Constant:i32<255>, Constant:i32<255>, Constant:i32<255>
      t34: v4i64 = bitcast t9
    t35: v4i64 = and t32, t34
  t36: v8i32 = bitcast t35
        t37: v4i32 = extract_subvector t36, Constant:i64<0>
        t38: v4i32 = extract_subvector t36, Constant:i64<4>


https://reviews.llvm.org/D52318

Files:
  lib/CodeGen/SelectionDAG/SelectionDAG.cpp
  lib/Target/X86/X86ISelLowering.cpp
  test/CodeGen/X86/avx-logic.ll


Index: test/CodeGen/X86/avx-logic.ll
===================================================================

--- test/CodeGen/X86/avx-logic.ll
+++ test/CodeGen/X86/avx-logic.ll
@@ -342,9 +342,9 @@
 ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm4
 ; AVX1-NEXT:    vpaddd %xmm3, %xmm4, %xmm3
 ; AVX1-NEXT:    vpaddd %xmm0, %xmm1, %xmm0
-; AVX1-NEXT:    vinsertf128 $1, %xmm3, %ymm0, %ymm0
-; AVX1-NEXT:    vandnps {{.*}}(%rip), %ymm0, %ymm0
-; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm1
+; AVX1-NEXT:    vmovdqa {{.*#+}} xmm1 = [1095216660735,1095216660735]
+; AVX1-NEXT:    vpandn %xmm1, %xmm0, %xmm0
+; AVX1-NEXT:    vpandn %xmm1, %xmm3, %xmm1
 ; AVX1-NEXT:    vextractf128 $1, %ymm2, %xmm3
 ; AVX1-NEXT:    vpaddd %xmm3, %xmm1, %xmm1
 ; AVX1-NEXT:    vpaddd %xmm2, %xmm0, %xmm0
Index: lib/Target/X86/X86ISelLowering.cpp
===================================================================
--- lib/Target/X86/X86ISelLowering.cpp
+++ lib/Target/X86/X86ISelLowering.cpp
@@ -40152,6 +40152,31 @@
 static SDValue combineExtractSubvector(SDNode *N, SelectionDAG &DAG,
                                        TargetLowering::DAGCombinerInfo &DCI,
                                        const X86Subtarget &Subtarget) {
+  // For AVX1 only, if we are extracting from a 256-bit and+not (which will
+  // eventually get combined/lowered into ANDNP), split the 'and' into 128-bit
+  // ops to avoid the extraction (and likely concatenation before this). We let
+  // generic combining take over from there to simplify the insert/extract and
+  // 'not'.
+  // This pattern emerges during AVX1 legalization. We handle it before lowering
+  // to avoid complications like splitting constant vector loads.
+
+  // Capture the original wide type in the likely case that we need to bitcast
+  // back to this type.
+  EVT VT = N->getValueType(0);
+  EVT WideVecVT = N->getOperand(0).getValueType();
+  SDValue WideVec = peekThroughBitcasts(N->getOperand(0));
+  if (Subtarget.hasAVX() && !Subtarget.hasAVX2() && WideVecVT.isSimple() &&
+      WideVecVT.getSizeInBits() == 256 && WideVec.getOpcode() == ISD::AND) {
+    SDValue WideOp0 = peekThroughBitcasts(WideVec.getOperand(0));
+    SDValue WideOp1 = peekThroughBitcasts(WideVec.getOperand(1));
+    if (isBitwiseNot(WideOp0) || isBitwiseNot(WideOp1)) {
+      // extract (and v4i64 X, (not Y)), n --> andnp v2i64 X(n), Y(n)
+      SDValue Concat = split256IntArith(WideVec, DAG);
+      return DAG.getNode(ISD::EXTRACT_SUBVECTOR, SDLoc(N), VT,
+                         DAG.getBitcast(WideVecVT, Concat), N->getOperand(1));
+    }
+  }
+
   if (DCI.isBeforeLegalizeOps())
     return SDValue();
 
Index: lib/CodeGen/SelectionDAG/SelectionDAG.cpp
===================================================================
--- lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -8205,7 +8205,7 @@
 bool llvm::isBitwiseNot(SDValue V) {
   if (V.getOpcode() != ISD::XOR)
     return false;
-  ConstantSDNode *C = isConstOrConstSplat(V.getOperand(1));
+  ConstantSDNode *C = isConstOrConstSplat(peekThroughBitcasts(V.getOperand(1)));
   return C && C->isAllOnesValue();
 }
 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D52318.166336.patch
Type: text/x-patch
Size: 3137 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180920/38dbcc75/attachment.bin>