[PATCH] D14588: [X86][SSE] Transform truncation from v8i32/v16i32 to v8i8/v16i8 into bitand and X86ISD::PACKUS operations during DAG combine.

Mon Nov 30 05:36:42 PST 2015

RKSimon added inline comments.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26229
@@ +26228,3 @@
+  // On AVX2, the behavior of X86ISD::PACKUS is different from that on SSE2 and
+  // we could not benefit from this method.
+  // AVX512 provides vpmovdb.
----------------
I understand that packus(ymm) won't do what we want but - won't AVX2 still benefit for cases where packus(xmm) is used? Why not just early out if (!hasSSE2 || VT.getSizeInBits() > 128) ?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26230
@@ +26229,3 @@
+  // we could not benefit from this method.
+  // AVX512 provides vpmovdb.
+  if (!Subtarget->hasSSE2() || Subtarget->hasAVX2())
----------------
Please can you add AVX512 as a test target to prove that its using vpmovdb etc.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26269
@@ +26268,3 @@
+      SubVec[i / 2] = DAG.getNode(ISD::BITCAST, DL, MVT::v4i32, SubVec[i / 2]);
+    }
+    SubVec.resize(RegNum / 2);
----------------
Please don't use domain switches, they can cause massive stalls on pipes. Why not just use DAG.getVectorShuffle()?

http://reviews.llvm.org/D14588