[llvm] [AArch64][SME2] Add FORM_STRIDED_TUPLE pseudo nodes (PR #116399)

Sander de Smalen via llvm-commits llvm-commits at lists.llvm.org
Tue Nov 19 09:57:18 PST 2024


================
@@ -5898,6 +5940,22 @@ SDValue AArch64TargetLowering::LowerINTRINSIC_VOID(SDValue Op,
         Op->getOperand(0), // Chain
         DAG.getTargetConstant((int32_t)(AArch64SVCR::SVCRZA), DL, MVT::i32),
         DAG.getConstant(AArch64SME::Always, DL, MVT::i64));
+  case Intrinsic::aarch64_sme_uvdot_lane_za32_vg1x4:
+  case Intrinsic::aarch64_sme_suvdot_lane_za32_vg1x4:
+  case Intrinsic::aarch64_sme_usvdot_lane_za32_vg1x4:
+  case Intrinsic::aarch64_sme_svdot_lane_za32_vg1x4:
+  case Intrinsic::aarch64_sme_usdot_lane_za32_vg1x4:
+  case Intrinsic::aarch64_sme_udot_lane_za32_vg1x4:
+  case Intrinsic::aarch64_sme_sudot_lane_za32_vg1x4:
+  case Intrinsic::aarch64_sme_sdot_lane_za32_vg1x4:
+    return TryLowerMultiVecSMEDotIntrinsic(Op, DAG, 4);
+  case Intrinsic::aarch64_sme_uvdot_lane_za32_vg1x2:
+  case Intrinsic::aarch64_sme_sdot_lane_za32_vg1x2:
+  case Intrinsic::aarch64_sme_svdot_lane_za32_vg1x2:
+  case Intrinsic::aarch64_sme_usdot_lane_za32_vg1x2:
+  case Intrinsic::aarch64_sme_sudot_lane_za32_vg1x2:
+  case Intrinsic::aarch64_sme_udot_lane_za32_vg1x2:
+    return TryLowerMultiVecSMEDotIntrinsic(Op, DAG, 2);
----------------
sdesmalen-arm wrote:

Perhaps a structurally simpler way to implement this (avoiding the need to do custom isel) is to change the patterns to always use the `FORM_STRIDED_TUPLE_X.._PSEUDO` instruction instead of `REG_SEQUENCE`.

For the multi-vector load case that you're trying to improve, the inputs to the tuple are always COPY nodes of the form:

```
%9:zpr = COPY %7.zsub0:zpr2stridedorcontiguous
```

There are cases where the RegisterCoalescer can make better decisions when using regular COPY nodes rather than the FORM_STRIDED_TUPLE pseudos. We could choose to handle the FORM_STRIDED_TUPLE pseudo with the `hasPostISelHook = 1` where directly post-isel they are transformed into a REG_SEQUENCE node when any of the input values are not COPY nodes where the source register is in  a 'stridedorcontiguous' register class. The REG_SEQUENCE node itself is then lowered later by the TwoAddressInstructionPass into individual COPY nodes.

https://github.com/llvm/llvm-project/pull/116399


More information about the llvm-commits mailing list