[PATCH] D14840: [X86] Detect SAD patterns and emit psadbw instructions on X86.

Thu Feb 25 15:10:56 PST 2016

ab added a subscriber: ab.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:28588-28600
@@ -28587,1 +28587,15 @@
 
+// Check if the given SDValue is a constant vector with all Val.
+static bool isConstVectorOf(SDValue V, int Val) {
+  BuildVectorSDNode *BV = dyn_cast<BuildVectorSDNode>(V);
+  if (!BV || !BV->isConstant())
+    return false;
+  auto NumOperands = V.getNumOperands();
+  for (unsigned i = 0; i < NumOperands; i++) {
+    ConstantSDNode *C = dyn_cast<ConstantSDNode>(V.getOperand(i));
+    if (!C || C->getSExtValue() != Val)
+      return false;
+  }
+  return true;
+}
+
----------------
Perhaps you could replace this with BuildVectorSDNode::getConstantSplatNode()?

..but looking at the uses, you can replace them all with isBuildVectorAllOnes/isBuildVectorAllZeros.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:28613
@@ +28612,3 @@
+
+  if (!(VT.getVectorElementType() == MVT::i32 && isPowerOf2_32(NumElems)))
+    return SDValue();
----------------
All vector MVTs have power-of-2 NumElems, so I think you can drop the second part (and NumElems, and merge the ifs).

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:28616-28623
@@ +28615,10 @@
+
+  unsigned RegSize = 128;
+  if (Subtarget.hasAVX512())
+    RegSize = 512;
+  else if (Subtarget.hasAVX2())
+    RegSize = 256;
+
+  if (VT.getSizeInBits() > RegSize)
+    return SDValue();
+
----------------
What happens on v2i32?

Also, doesn't this require SSE2?

I'd just explicitly check subtarget and type, like we do elsewhere. So, something like:


```
if ((VT == MVT::v16i32 && !Subtarget.hasAVX512()) ||
    (VT == MVT::v8i32 && !Subtarget.hasAVX2()) ||
    (VT == MVT::v4i32 && !Subtarget.hasSSE2()))
  return SDValue();
```

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:28697-28701
@@ +28696,7 @@
+  // Legalize the type of the inputs of PSADBW.
+  EVT InVT = Op0.getOperand(0).getValueType();
+  if (InVT.getSizeInBits() <= 128)
+    RegSize = 128;
+  else if (InVT.getSizeInBits() <= 256)
+    RegSize = 256;
+
----------------
Why is this necessary? It seems to me that this is forcing RegSize to 128 (because even v16i8, which requires AVX-512 as it's added as v16i32 has a size of 128). In turn, that forces XMM-sized PSADs to be generated, even on AVX2. Am I missing something?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:28740-28742
@@ +28739,5 @@
+  if (Flags->hasVectorReduction()) {
+    SDValue Sad = detectSADPattern(N, DAG, Subtarget);
+    if (Sad.getNode())
+      return Sad;
+  }
----------------
```
    if (SDValue Sad = detectSADPattern(N, DAG, Subtarget))
      return Sad;
```

================
Comment at: test/CodeGen/X86/sad.ll:1-6
@@ +1,7 @@
+; RUN: opt < %s -O2 -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+sse2 -force-target-max-vector-interleave=1 -unroll-count=1 | llc | FileCheck %s --check-prefix=SSE2
+; RUN: opt < %s -O2 -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+avx2 -force-target-max-vector-interleave=1 -unroll-count=1 | llc | FileCheck %s --check-prefix=AVX2
+; RUN: opt < %s -O2 -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+avx512bw -force-target-max-vector-interleave=1 -unroll-count=1 | llc | FileCheck %s --check-prefix=AVX512BW
+; RUN: opt < %s -O2 -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+sse2 -force-target-max-vector-interleave=1 | llc | FileCheck %s --check-prefix=SSE2-UNROLL
+; RUN: opt < %s -O2 -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+avx2 -force-target-max-vector-interleave=1 | llc | FileCheck %s --check-prefix=AVX2-UNROLL
+; RUN: opt < %s -O2 -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+avx512bw -force-target-max-vector-interleave=1 | llc | FileCheck %s --check-prefix=AVX512BW-UNROLL
+
----------------
IMHO you should only test the minimal IR with llc here. I see the argument for testing the whole thing, but it seems like 1) that prevents you from testing unrelated constructs (not all IR comes from the vectorizer), and 2) there should already be relevant opt tests.

If this fires on the test-suite (IIRC paq8p could benefit from this?), then people will notice regressions. If it doesn't, is there something we could add to the test-suite that exposes this?


Also, I don't think you need -mcpu.


http://reviews.llvm.org/D14840