[PATCH] D22814: [X86][SSE] Optimize the truncation of vector comparison results with PACKSS

Michael Kuperstein via llvm-commits llvm-commits at lists.llvm.org
Wed Jul 27 15:12:57 PDT 2016


mkuper added a comment.

Thanks Simon, nice trick!

I don't know about AVX512 - hopefully Elena can answer, but, in any case, I don't see a reason not to enable this for AVX2 and below for now.


================
Comment at: lib/Target/X86/X86ISelLowering.cpp:14150
@@ +14149,3 @@
+  // No truncation required.
+  if (SrcVT == DstVT)
+    return In;
----------------
Is this valid? If not, maybe assert instead?
For IR-level truncs, we explicitly disallow equal types, and I'd be slightly surprised if we allowed them for the ISD.


================
Comment at: lib/Target/X86/X86ISelLowering.cpp:14185
@@ +14184,3 @@
+
+    // 256-bit PACKSS(ARG0, ARG1) leaves us with ((LO0,LO1),(HI0,HI1)),
+    // so we need to shuffle to get ((LO0,HI0),(LO1,HI1)).
----------------
Sorry, I'm still confused about this.

We just truncated, say, a v8i64 into a v8i32. The high 4 values should have ended up in the high 4 lanes of the ymm, and the low 4 values should have ended up in the 4 low lanes. Why isn't this what we want?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:14200
@@ +14199,3 @@
+  // Recursively pack lower/upper subvectors, concat result and pack again.
+  assert(SrcVT.getSizeInBits() >= 512 && "Expected 512-bit vector or greater");
+  EVT PackedVT = EVT::getVectorVT(*DAG.getContext(), PackedSVT, NumElems / 2);
----------------
I may have missed it, but I didn't see this is checked explicitly anywhere in the callers.
What happens if you hit the combine version of this, pre-legalization, truncating from v32i64 to v32i8, for instance? The destination is a 256-bit type, and the source is simple.

(Also, greater -> smaller)

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:14280
@@ +14279,3 @@
+    unsigned Opcode = V.getOpcode();
+    return (Opcode == X86ISD::PCMPGT || Opcode == X86ISD::PCMPEQ ||
+            Opcode == X86ISD::CMPP);
----------------
How about VPCOM and VPCOMU?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:29861
@@ +29860,3 @@
+                                              const X86Subtarget &Subtarget) {
+  // AVX512 has fast truncate.
+  if (Subtarget.hasAVX512())
----------------
Perhaps add an assert that getBooleanContents() is ZeroOrNegativeOneBooleanContent?
I don't expect this to change anytime soon, but... :-)


Repository:
  rL LLVM

https://reviews.llvm.org/D22814





More information about the llvm-commits mailing list