[PATCH] D14840: [X86] Detect SAD patterns and emit psadbw instructions on X86.

Fri Feb 26 16:33:48 PST 2016

congh marked 2 inline comments as done.
congh added a comment.

Thanks for the review! Please check my inline comments.


================
Comment at: lib/Target/X86/X86ISelLowering.cpp:28588-28600
@@ -28587,1 +28587,15 @@
 
+// Check if the given SDValue is a constant vector with all Val.
+static bool isConstVectorOf(SDValue V, int Val) {
+  BuildVectorSDNode *BV = dyn_cast<BuildVectorSDNode>(V);
+  if (!BV || !BV->isConstant())
+    return false;
+  auto NumOperands = V.getNumOperands();
+  for (unsigned i = 0; i < NumOperands; i++) {
+    ConstantSDNode *C = dyn_cast<ConstantSDNode>(V.getOperand(i));
+    if (!C || C->getSExtValue() != Val)
+      return false;
+  }
+  return true;
+}
+
----------------
ab wrote:
> Perhaps you could replace this with BuildVectorSDNode::getConstantSplatNode()?
> 
> ..but looking at the uses, you can replace them all with isBuildVectorAllOnes/isBuildVectorAllZeros.
I also need to check if a vector contains all -1s, which could not be satisfied by isBuildVectorAllOnes/isBuildVectorAllZeros. How to use getConstantSplatNode here?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:28616-28623
@@ +28615,10 @@
+
+  unsigned RegSize = 128;
+  if (Subtarget.hasAVX512())
+    RegSize = 512;
+  else if (Subtarget.hasAVX2())
+    RegSize = 256;
+
+  if (VT.getSizeInBits() > RegSize)
+    return SDValue();
+
----------------
ab wrote:
> What happens on v2i32?
> 
> Also, doesn't this require SSE2?
> 
> I'd just explicitly check subtarget and type, like we do elsewhere. So, something like:
> 
> 
> ```
> if ((VT == MVT::v16i32 && !Subtarget.hasAVX512()) ||
>     (VT == MVT::v8i32 && !Subtarget.hasAVX2()) ||
>     (VT == MVT::v4i32 && !Subtarget.hasSSE2()))
>   return SDValue();
> ```
I think the case of v2i32 may not be generated by the vectorizer but as we are not just handling the results of auto-vectorization, I think it is better to cover this case.

For v16i32 we only need SSE2 as we are actually handling v16i8, and the result will be collected as a v2i64. We need AVX512 to handle v64i32.

Therefore, my previous check here is incorrect, which should be


```
  if (VT.getSizeInBits() / 4 > RegSize)
    return SDValue();
```

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:28697-28701
@@ +28696,7 @@
+  // Legalize the type of the inputs of PSADBW.
+  EVT InVT = Op0.getOperand(0).getValueType();
+  if (InVT.getSizeInBits() <= 128)
+    RegSize = 128;
+  else if (InVT.getSizeInBits() <= 256)
+    RegSize = 256;
+
----------------
ab wrote:
> Why is this necessary? It seems to me that this is forcing RegSize to 128 (because even v16i8, which requires AVX-512 as it's added as v16i32 has a size of 128). In turn, that forces XMM-sized PSADs to be generated, even on AVX2. Am I missing something?
v16i8 is actually handled by SSE2, while AVX-512 is handling v64i8. This means even on AVX512 we should still use SSE2 instructions to handle v16i8.

================
Comment at: test/CodeGen/X86/sad.ll:1-6
@@ +1,7 @@
+; RUN: opt < %s -O2 -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+sse2 -force-target-max-vector-interleave=1 -unroll-count=1 | llc | FileCheck %s --check-prefix=SSE2
+; RUN: opt < %s -O2 -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+avx2 -force-target-max-vector-interleave=1 -unroll-count=1 | llc | FileCheck %s --check-prefix=AVX2
+; RUN: opt < %s -O2 -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+avx512bw -force-target-max-vector-interleave=1 -unroll-count=1 | llc | FileCheck %s --check-prefix=AVX512BW
+; RUN: opt < %s -O2 -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+sse2 -force-target-max-vector-interleave=1 | llc | FileCheck %s --check-prefix=SSE2-UNROLL
+; RUN: opt < %s -O2 -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+avx2 -force-target-max-vector-interleave=1 | llc | FileCheck %s --check-prefix=AVX2-UNROLL
+; RUN: opt < %s -O2 -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+avx512bw -force-target-max-vector-interleave=1 | llc | FileCheck %s --check-prefix=AVX512BW-UNROLL
+
----------------
ab wrote:
> IMHO you should only test the minimal IR with llc here. I see the argument for testing the whole thing, but it seems like 1) that prevents you from testing unrelated constructs (not all IR comes from the vectorizer), and 2) there should already be relevant opt tests.
> 
> If this fires on the test-suite (IIRC paq8p could benefit from this?), then people will notice regressions. If it doesn't, is there something we could add to the test-suite that exposes this?
> 
> 
> Also, I don't think you need -mcpu.
I just want to avoid writing the test case for different targets and hence the test code size is minimized. But I agree with you on that we should not rely on auto-vectorization for the test of this patch. So the minimal IR here would be several huge functions for different targets?


http://reviews.llvm.org/D14840