[PATCH] D56784: [X86][SSE] Use PSLLDQ/PSRLDQ to mask out zeroable ends of a shuffle

Thu Jan 17 06:35:13 PST 2019

spatel added a comment.

In D56784#1361280 <https://reviews.llvm.org/D56784#1361280>, @RKSimon wrote:

> In D56784#1360580 <https://reviews.llvm.org/D56784#1360580>, @spatel wrote:
>
> > The double-shift cases look good, but I'm skeptical about the triple-shift. Wouldn't those always be better with an 'and' mask followed by shift? We reduce the dependent chain of vector ops and instruction count for the cost of a speculatable constant pool load.
>
>
> I did consider that but then we contradict the "3 op limit" for older machines (like pre-SSSE3) before using "variable" shuffle masks - which includes AND masks.

Ah, so we would expect an even later transform (combineX86ShufflesRecursively?) to squash that. Worth adding a TODO comment about that? Or maybe nobody cares about pre-SSSE3 perf that much to bother.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:10900-10901
+///
+/// Use a VSHLDQ/VSRLDQ pair to zero the ends of a vector and leave an
+/// inner sequential set of elements, possibly offset.
+static SDValue lowerVectorShuffleAsByteShiftMask(
----------------
The pair part of the comment is over-specific for the top-level - move it below where we have the example sequences?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:10921-10924
+  ArrayRef<int> StubMask = Mask.slice(ZeroLo, Len);
+  if (!isUndefOrInRange(StubMask, 0, NumElts) &&
+      !isUndefOrInRange(StubMask, NumElts, 2 * NumElts))
+    return SDValue();
----------------
Could we do the simpler check/assert that V2 has been canonicalized to a zero constant?

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D56784/new/

https://reviews.llvm.org/D56784