[llvm] [SVE] Wide active lane mask (PR #76514)

Momchil Velikov via llvm-commits llvm-commits at lists.llvm.org
Thu Jan 18 02:49:59 PST 2024


================
@@ -19791,15 +19792,63 @@ static SDValue performIntrinsicCombine(SDNode *N,
           EVT::getVectorVT(*DAG.getContext(), PromVT.getVectorElementType(),
                            VT.getVectorElementCount());
 
-      Res = DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, WhileVT, ID,
-                        N->getOperand(1), N->getOperand(2));
+      SDValue Res = DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, WhileVT, ID,
+                                N->getOperand(1), N->getOperand(2));
       Res = DAG.getNode(ISD::SIGN_EXTEND, DL, PromVT, Res);
       Res = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, ExtVT, Res,
                         DAG.getConstant(0, DL, MVT::i64));
       Res = DAG.getNode(ISD::TRUNCATE, DL, VT, Res);
+
+      return Res;
     }
-    return Res;
+
+    if (!Subtarget->hasSVE2p1() && !Subtarget->hasSME2())
+      return SDValue();
+
+    if (!N->hasNUsesOfValue(2, 0))
+      return SDValue();
+
+    auto It = N->use_begin();
+    SDNode *Lo = *It++;
----------------
momchil-velikov wrote:

Ah, I see what you mean. However:
* there's no guarantee that we get the uses in any particular order, even if we generated them in some order. Hence we need the test and the swap (which is super cheap anyway, something like two conditional moves)
* we do generate the extracts in the "opposite" order, i.e. the last extract is the least significant bits, offset 0. This is done for another reason (so we lower the chance that a flag setting instruction will appear before the condition), but it has the fortunate side effect that the swap will likely not be needed, c.f. https://github.com/momchil-velikov/llvm-project/blob/766c78aebecfacb27dc1a8e221aebca37d93359d/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp#L537

https://github.com/llvm/llvm-project/pull/76514


More information about the llvm-commits mailing list