[PATCH] D143786: [X86] Add `TuningPreferShiftShuffle` for when Shifts are preferable to shuffles.

Sat Feb 11 03:24:22 PST 2023

RKSimon added a comment.

Without AVX512 we can't load fold arg0 for bit-shift ops - isn't that likely to be a problem?

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:18288

-  if (V2.isUndef()) {
-    // When the shuffle is mirrored between the 128-bit lanes of the unit, we
-    // can use lower latency instructions that will operate on both lanes.
-    SmallVector<int, 2> RepeatedMask;
-    if (is128BitLaneRepeatedShuffleMask(MVT::v4i64, Mask, RepeatedMask)) {
-      SmallVector<int, 4> PSHUFDMask;
-      narrowShuffleMaskElts(2, RepeatedMask, PSHUFDMask);
-      return DAG.getBitcast(
-          MVT::v4i64,
-          DAG.getNode(X86ISD::PSHUFD, DL, MVT::v8i32,
-                      DAG.getBitcast(MVT::v8i32, V1),
-                      getV4X86ShuffleImm8ForMask(PSHUFDMask, DL, DAG)));
-    }
+  for (unsigned Order = 0; Order < 2; ++Order) {
+    if (Subtarget.hasFasterShiftThanShuffle() ? (Order == 1) : (Order == 0)) {
----------------
This approach isn't particularly easy to grok - why not just add an additional lowerShuffleAsShift check before behind a hasFasterShiftThanShuffle check?

================
Comment at: llvm/test/CodeGen/X86/pr57340.ll:272
 ; CHECK-NEXT:    kandw %k1, %k0, %k0
-; CHECK-NEXT:    vpshufd {{.*#+}} xmm2 = xmm1[3,3,3,3]
+; CHECK-NEXT:    vpsrldq {{.*#+}} xmm2 = xmm1[12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
 ; CHECK-NEXT:    vpextrw $0, %xmm2, %eax
----------------
Are byte shifts faster I thought they were still Port5 bound? 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143786/new/

https://reviews.llvm.org/D143786