[llvm] [X86] combineX86ShuffleChain - always prefer VPERMQ/PD for unary subvector shuffles on AVX2+ targets (PR #134849)

Tue Apr 8 07:21:07 PDT 2025

================
@@ -39838,15 +39838,14 @@ static SDValue combineX86ShuffleChain(
       return insertSubVector(Lo, Hi, NumRootElts / 2, DAG, DL, 128);
     }
 
-    if (Depth == 0 && RootOpc == X86ISD::VPERM2X128)
-      return SDValue(); // Nothing to do!
-
     // If we have AVX2, prefer to use VPERMQ/VPERMPD for unary shuffles unless
     // we need to use the zeroing feature.
     // Prefer blends for sequential shuffles unless we are optimizing for size.
     if (UnaryShuffle &&
         !(Subtarget.hasAVX2() && isUndefOrInRange(Mask, 0, 2)) &&
----------------
phoebewang wrote:

My understanding is `return SDValue()` is used for the VPERMQ/VPERMPD combine. Since it doesn't happen for AVX1, why do we still do it here given we have checked `Subtarget.hasAVX2()`. Or there's some difference from `isUndefOrInRange(Mask, 0, 2)`?

https://github.com/llvm/llvm-project/pull/134849