[llvm] [RISCV] Match deinterleave(4,8) shuffles to SHL/TRUNC when legal (PR #118509)

Philip Reames via llvm-commits llvm-commits at lists.llvm.org
Tue Dec 3 11:32:59 PST 2024


================
@@ -5332,11 +5301,31 @@ static SDValue lowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG,
   if (ShuffleVectorInst::isReverseMask(Mask, NumElts) && V2.isUndef())
     return DAG.getNode(ISD::VECTOR_REVERSE, DL, VT, V1);
 
-  // If this is a deinterleave and we can widen the vector, then we can use
-  // vnsrl to deinterleave.
-  if (SDValue Src =
-          isDeinterleaveShuffle(VT, ContainerVT, V1, V2, Mask, Subtarget))
-    return getDeinterleaveViaVNSRL(DL, VT, Src, Mask[0] == 0, DAG);
+  // If this is a deinterleave(2,4,8) and we can widen the vector, then we can
+  // use shift and truncate to perform the shuffle.
+  // TODO: For Factor=6, we can perform the first step of the deinterleave via
+  // shift-and-trunc reducing total cost for everything except an mf8 result.
+  // TODO: For Factor=4,8, we can do the same when the ratio isn't high enough
+  // to do the entire operation.
+  if (VT.getScalarSizeInBits() < Subtarget.getELen()) {
+    const unsigned MaxFactor = Subtarget.getELen() / VT.getScalarSizeInBits();
+    assert(MaxFactor == 2 || MaxFactor == 4 || MaxFactor == 8);
+    for (unsigned Factor = 2; Factor <= MaxFactor; Factor <<= 1) {
+      unsigned Index = 0;
+      if (ShuffleVectorInst::isDeInterleaveMaskOfFactor(Mask, Factor, Index) &&
+          1 < count_if(Mask, [](int Idx) { return Idx != -1; })) {
+        if (SDValue Src = getSingleShuffleSrc(VT, ContainerVT, V1, V2)) {
+          if (Src.getValueType() == VT) {
+            EVT WideVT = VT.getDoubleNumVectorElementsVT();
----------------
preames wrote:

I made this change, it does work.  It also introduces some additional VL and VTYPE toggles, which is why I think we didn't do this before.

Two questions for you.

1) Are we bothered by the toggles?  I can probably eliminate them < m1 by adding back the removed code, but conditional on LMUL.   For the > m2 cases, this is probably an improvement.  We could potentially treat this a generic improvement to add to e.g. VLOptimizer too.

2) Do we want to separate this change into it's own review?  It feels somewhat borderline to me honestly, so will defer to your preference.

https://github.com/llvm/llvm-project/pull/118509


More information about the llvm-commits mailing list