[PATCH] D143786: [X86] Add `TuningPreferShiftShuffle` for when Shifts are preferable to shuffles.

Sun Feb 12 14:22:00 PST 2023

goldstein.w.n marked an inline comment as done.
goldstein.w.n added a comment.

In D143786#4120206 <https://reviews.llvm.org/D143786#4120206>, @RKSimon wrote:

> Without AVX512 we can't load fold arg0 for bit-shift ops - isn't that likely to be a problem?

I'm not sure what you mean?
But the tuning is only for SKX which has avx512.

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:18288

-  if (V2.isUndef()) {
-    // When the shuffle is mirrored between the 128-bit lanes of the unit, we
-    // can use lower latency instructions that will operate on both lanes.
-    SmallVector<int, 2> RepeatedMask;
-    if (is128BitLaneRepeatedShuffleMask(MVT::v4i64, Mask, RepeatedMask)) {
-      SmallVector<int, 4> PSHUFDMask;
-      narrowShuffleMaskElts(2, RepeatedMask, PSHUFDMask);
-      return DAG.getBitcast(
-          MVT::v4i64,
-          DAG.getNode(X86ISD::PSHUFD, DL, MVT::v8i32,
-                      DAG.getBitcast(MVT::v8i32, V1),
-                      getV4X86ShuffleImm8ForMask(PSHUFDMask, DL, DAG)));
-    }
+  for (unsigned Order = 0; Order < 2; ++Order) {
+    if (Subtarget.hasFasterShiftThanShuffle() ? (Order == 1) : (Order == 0)) {
----------------
goldstein.w.n wrote:
> RKSimon wrote:
> > This approach isn't particularly easy to grok - why not just add an additional lowerShuffleAsShift check before behind a hasFasterShiftThanShuffle check?
> Was to avoid duplicating ~30 lines of code, but will do for v2.
Refactored as you suggest everything except matchunaryshufflepermute helper where it would cause too much duplication imo.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143786/new/

https://reviews.llvm.org/D143786