[llvm] [X86] Improve variable 8-bit shifts on AVX512BW (PR #164136)

Sun Oct 19 05:30:37 PDT 2025

================
@@ -30968,6 +30968,76 @@ static SDValue LowerShift(SDValue Op, const X86Subtarget &Subtarget,
     return DAG.getNode(X86ISD::PACKUS, dl, VT, LoR, HiR);
   }
 
+  if (VT == MVT::v64i8 && Subtarget.canExtendTo512BW()) {
+    // On AVX512BW, we can use variable 16-bit shifts to implement variable
+    // 8-bit shifts. For this, we split the input into two vectors, RLo and RHi.
+    // The i-th lane of RLo contains the (2*i)-th lane of R, and the i-th lane
+    // of RHi contains the (2*i+1)-th lane of R. After shifting, these vectors
+    // can efficiently be merged together using a masked move.
+    MVT ExtVT = MVT::v32i16;
+
+    // When used in a vectorshuffle, selects even-index lanes from the first
+    // vector and odd index lanes from the second vector.
+    SmallVector<int, 64> InterleaveIndices;
+    for (unsigned i = 0; i < 64; ++i) {
+      unsigned offset = (i % 2 == 0) ? 0 : 64;
+      InterleaveIndices.push_back(i + offset);
----------------
RKSimon wrote:

Is this an interleave or a select mask?

https://github.com/llvm/llvm-project/pull/164136