[llvm] [AArch64][SVE2] Generate SVE2 BSL instruction in LLVM for bit-twiddling. (PR #83514)

Mon Mar 11 11:42:18 PDT 2024

================
@@ -17594,24 +17594,20 @@ static SDValue tryCombineToBSL(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,
   EVT VT = N->getValueType(0);
   SelectionDAG &DAG = DCI.DAG;
   SDLoc DL(N);
+  const auto &Subtarget = DAG.getSubtarget<AArch64Subtarget>();
 
   if (!VT.isVector())
     return SDValue();
 
-  // The combining code currently only works for NEON vectors. In particular,
-  // it does not work for SVE when dealing with vectors wider than 128 bits.
-  // It also doesn't work for streaming mode because it causes generating
-  // bsl instructions that are invalid in streaming mode.
-  if (TLI.useSVEForFixedLengthVectorVT(
-          VT, !DAG.getSubtarget<AArch64Subtarget>().isNeonAvailable()))
+  // The combining code works for NEON, SVE2 and SME.
+  if (TLI.useSVEForFixedLengthVectorVT(VT, !Subtarget.isNeonAvailable()) ||
+      (VT.isScalableVector() && !Subtarget.hasSVE2orSME()))
     return SDValue();
 
   SDValue N0 = N->getOperand(0);
-  if (N0.getOpcode() != ISD::AND)
-    return SDValue();
-
   SDValue N1 = N->getOperand(1);
-  if (N1.getOpcode() != ISD::AND)
+  if (!N->getFlags().hasDisjoint() &&
+      (N0.getOpcode() != ISD::AND || N1.getOpcode() != ISD::AND))
----------------
paulwalker-arm wrote:

This looks weird to me.  You're basically saying we can skip the early return when `N` has the `disjoint` flag? but knowing the operands are `and` instructions is fundamental to the algorithm?

https://github.com/llvm/llvm-project/pull/83514