[PATCH] D22814: [X86][SSE] Optimize the truncation of vector comparison results with PACKSS
Michael Kuperstein via llvm-commits
llvm-commits at lists.llvm.org
Wed Jul 27 15:12:57 PDT 2016
mkuper added a comment.
Thanks Simon, nice trick!
I don't know about AVX512 - hopefully Elena can answer, but, in any case, I don't see a reason not to enable this for AVX2 and below for now.
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:14150
@@ +14149,3 @@
+ // No truncation required.
+ if (SrcVT == DstVT)
+ return In;
----------------
Is this valid? If not, maybe assert instead?
For IR-level truncs, we explicitly disallow equal types, and I'd be slightly surprised if we allowed them for the ISD.
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:14185
@@ +14184,3 @@
+
+ // 256-bit PACKSS(ARG0, ARG1) leaves us with ((LO0,LO1),(HI0,HI1)),
+ // so we need to shuffle to get ((LO0,HI0),(LO1,HI1)).
----------------
Sorry, I'm still confused about this.
We just truncated, say, a v8i64 into a v8i32. The high 4 values should have ended up in the high 4 lanes of the ymm, and the low 4 values should have ended up in the 4 low lanes. Why isn't this what we want?
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:14200
@@ +14199,3 @@
+ // Recursively pack lower/upper subvectors, concat result and pack again.
+ assert(SrcVT.getSizeInBits() >= 512 && "Expected 512-bit vector or greater");
+ EVT PackedVT = EVT::getVectorVT(*DAG.getContext(), PackedSVT, NumElems / 2);
----------------
I may have missed it, but I didn't see this is checked explicitly anywhere in the callers.
What happens if you hit the combine version of this, pre-legalization, truncating from v32i64 to v32i8, for instance? The destination is a 256-bit type, and the source is simple.
(Also, greater -> smaller)
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:14280
@@ +14279,3 @@
+ unsigned Opcode = V.getOpcode();
+ return (Opcode == X86ISD::PCMPGT || Opcode == X86ISD::PCMPEQ ||
+ Opcode == X86ISD::CMPP);
----------------
How about VPCOM and VPCOMU?
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:29861
@@ +29860,3 @@
+ const X86Subtarget &Subtarget) {
+ // AVX512 has fast truncate.
+ if (Subtarget.hasAVX512())
----------------
Perhaps add an assert that getBooleanContents() is ZeroOrNegativeOneBooleanContent?
I don't expect this to change anytime soon, but... :-)
Repository:
rL LLVM
https://reviews.llvm.org/D22814
More information about the llvm-commits
mailing list