[PATCH] D14840: [X86] Detect SAD patterns and emit psadbw instructions on X86.

Cong Hou via llvm-commits llvm-commits at lists.llvm.org
Thu Mar 10 15:31:20 PST 2016


congh added a comment.

In http://reviews.llvm.org/D14840#369663, @chandlerc wrote:

> Simon and Ahmed seem to have a good handle on this, just wanted to say...
>
> In http://reviews.llvm.org/D14840#369592, @RKSimon wrote:
>
> > The checks on the tests are quite poor - I understand that update_llc_test_checks.py output might be too much but please see if you can expand the checks to give a better idea of context.
>
>
> I would strongly encourage folks to use update_llc_test_checks.py. If it's not working here, we should make a version that does work. The resulting test accuracy is just such a huge improvement.


I have updated the test file by running update_llc_test_checks.py. Thanks!


================
Comment at: lib/Target/X86/X86ISelLowering.cpp:28751-28763
@@ -28750,1 +28750,15 @@
 
+static SDValue detectSADPattern(SDNode *N, SelectionDAG &DAG,
+                                const X86Subtarget &Subtarget) {
+  SDLoc DL(N);
+  EVT VT = N->getValueType(0);
+  SDValue Op0 = N->getOperand(0);
+  SDValue Op1 = N->getOperand(1);
+
+  if (!VT.isVector() || !VT.isSimple() ||
+      !(VT.getVectorElementType() == MVT::i32))
+    return SDValue();
+
+  unsigned RegSize = 128;
+  if (Subtarget.hasAVX512())
+    RegSize = 512;
----------------
RKSimon wrote:
> 512-bit PSAD requires AVX512BW (Subtarget.hasBWI()) - plain AVX512 can't handle 512-bit vector elements smaller than 32-bits - please ensure the tests run with avx512f (which will use the AVX2 path) as well as avx512bw.
You are right! I have updated the test to cover both AVX512BW and AVX512F. Thanks!

================
Comment at: test/CodeGen/X86/sad.ll:2-7
@@ +1,8 @@
+; RUN: opt < %s -O2 -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+sse2 -force-target-max-vector-interleave=1 -unroll-count=1 | llc | FileCheck %s --check-prefix=SSE2
+; RUN: opt < %s -O2 -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+avx2 -force-target-max-vector-interleave=1 -unroll-count=1 | llc | FileCheck %s --check-prefix=AVX2
+; RUN: opt < %s -O2 -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+avx512bw -force-target-max-vector-interleave=1 -unroll-count=1 | llc | FileCheck %s --check-prefix=AVX512BW
+; RUN: opt < %s -O2 -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+sse2 -force-target-max-vector-interleave=1 | llc | FileCheck %s --check-prefix=SSE2-UNROLL
+; RUN: opt < %s -O2 -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+avx2 -force-target-max-vector-interleave=1 | llc | FileCheck %s --check-prefix=AVX2-UNROLL
+; RUN: opt < %s -O2 -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+avx512bw -force-target-max-vector-interleave=1 | llc | FileCheck %s --check-prefix=AVX512BW-UNROLL
+
+ at a = global [1024 x i8] zeroinitializer, align 16
----------------
RKSimon wrote:
> congh wrote:
> > ab wrote:
> > > Does it need to be huge? After all, this is only testing the PSAD ISel, and the pattern seems pretty minimal; would it work to test variants of your example in detectSADPattern?
> > I need to compose the test so that the reduction vector operations can be detected. But I think I can still write small tests.
> > 
> > To test on different targets, I have to split the test files into threes for testing sse2/avx2/avx512 as I found llc cannot parse opt -mattr=+avx512bw generated IR on AVX2. 
> This sounds like a bug - the test case has no target specific intrinsics so we should be able to handle it on every target.
Yes, this is a bug. When I try to compile sad-avx2.ll with -mattr=+avx512bw, I got an internal error. The bug is from DAGCombiner::visitINSERT_SUBVECTOR which tries to convert a INSERT_SUBVECTOR operation into a 
CONCAT_VECTORS one. We need to check if two operands of CONCAT_VECTORS have the same type there. This is fixed in this patch.

I then could combine those three tests into one file.


http://reviews.llvm.org/D14840





More information about the llvm-commits mailing list