[llvm] [AArch64] Add @llvm.experimental.vector.match (PR #101974)

Mon Nov 11 09:08:12 PST 2024

================
@@ -5761,6 +5774,84 @@ SDValue LowerSMELdrStr(SDValue N, SelectionDAG &DAG, bool IsLoad) {
                       DAG.getTargetConstant(ImmAddend, DL, MVT::i32)});
 }
 
+SDValue LowerVectorMatch(SDValue Op, SelectionDAG &DAG) {
+  SDLoc dl(Op);
+  SDValue ID =
+      DAG.getTargetConstant(Intrinsic::aarch64_sve_match, dl, MVT::i64);
+
+  auto Op1 = Op.getOperand(1);
+  auto Op2 = Op.getOperand(2);
+  auto Mask = Op.getOperand(3);
+
+  EVT Op1VT = Op1.getValueType();
+  EVT Op2VT = Op2.getValueType();
+  EVT ResVT = Op.getValueType();
+
+  assert((Op1VT.getVectorElementType() == MVT::i8 ||
+          Op1VT.getVectorElementType() == MVT::i16) &&
+         "Expected 8-bit or 16-bit characters.");
+
+  // Scalable vector type used to wrap operands.
+  // A single container is enough for both operands because ultimately the
+  // operands will have to be wrapped to the same type (nxv16i8 or nxv8i16).
+  EVT OpContainerVT = Op1VT.isScalableVector()
+                          ? Op1VT
+                          : getContainerForFixedLengthVector(DAG, Op1VT);
+
+  // Wrap Op2 in a scalable register, and splat it if necessary.
+  if (Op1VT.getVectorMinNumElements() == Op2VT.getVectorNumElements()) {
+    // If Op1 and Op2 have the same number of elements we can trivially wrap
+    // Op2 in an SVE register.
+    Op2 = convertToScalableVector(DAG, OpContainerVT, Op2);
+    // If the result is scalable, we need to broadcast Op2 to a full SVE
+    // register.
+    if (ResVT.isScalableVector())
+      Op2 = DAG.getNode(AArch64ISD::DUPLANE128, dl, OpContainerVT, Op2,
+                        DAG.getTargetConstant(0, dl, MVT::i64));
+  } else {
+    // If Op1 and Op2 have different number of elements, we need to broadcast
+    // Op2. Ideally we would use a AArch64ISD::DUPLANE* node for this
+    // similarly to the above, but unfortunately we seem to be missing some
+    // patterns for this. So, in alternative, we splat Op2 through a splat of
+    // a scalable vector extract. This idiom, though a bit more verbose, is
+    // supported and get us the MOV instruction we want.
+    unsigned Op2BitWidth = Op2VT.getFixedSizeInBits();
+    MVT Op2IntVT = MVT::getIntegerVT(Op2BitWidth);
+    MVT Op2PromotedVT = MVT::getVectorVT(Op2IntVT, 128 / Op2BitWidth,
----------------
rj-jesus wrote:

> I think `shouldExpandVectorMatch` enables the case where the search vector is v8i8 with a needle vector of v16i8?

Thanks, it does indeed! I've added the test `@match_v8i8_v16i8` to cover this scenario.
I've replaced the comparison between the number of elements of `Op1` and `Op2` with a check for `Op2VT.is128BitVector()`, and rebased the code with your dup(extract_elt) patch (which worked really well). Please let me know what you think.

https://github.com/llvm/llvm-project/pull/101974