[llvm] [AArch64][SME2] Add FORM_STRIDED_TUPLE pseudo nodes (PR #116399)
Sander de Smalen via llvm-commits
llvm-commits at lists.llvm.org
Tue Nov 19 09:57:18 PST 2024
================
@@ -5898,6 +5940,22 @@ SDValue AArch64TargetLowering::LowerINTRINSIC_VOID(SDValue Op,
Op->getOperand(0), // Chain
DAG.getTargetConstant((int32_t)(AArch64SVCR::SVCRZA), DL, MVT::i32),
DAG.getConstant(AArch64SME::Always, DL, MVT::i64));
+ case Intrinsic::aarch64_sme_uvdot_lane_za32_vg1x4:
+ case Intrinsic::aarch64_sme_suvdot_lane_za32_vg1x4:
+ case Intrinsic::aarch64_sme_usvdot_lane_za32_vg1x4:
+ case Intrinsic::aarch64_sme_svdot_lane_za32_vg1x4:
+ case Intrinsic::aarch64_sme_usdot_lane_za32_vg1x4:
+ case Intrinsic::aarch64_sme_udot_lane_za32_vg1x4:
+ case Intrinsic::aarch64_sme_sudot_lane_za32_vg1x4:
+ case Intrinsic::aarch64_sme_sdot_lane_za32_vg1x4:
+ return TryLowerMultiVecSMEDotIntrinsic(Op, DAG, 4);
+ case Intrinsic::aarch64_sme_uvdot_lane_za32_vg1x2:
+ case Intrinsic::aarch64_sme_sdot_lane_za32_vg1x2:
+ case Intrinsic::aarch64_sme_svdot_lane_za32_vg1x2:
+ case Intrinsic::aarch64_sme_usdot_lane_za32_vg1x2:
+ case Intrinsic::aarch64_sme_sudot_lane_za32_vg1x2:
+ case Intrinsic::aarch64_sme_udot_lane_za32_vg1x2:
+ return TryLowerMultiVecSMEDotIntrinsic(Op, DAG, 2);
----------------
sdesmalen-arm wrote:
Perhaps a structurally simpler way to implement this (avoiding the need to do custom isel) is to change the patterns to always use the `FORM_STRIDED_TUPLE_X.._PSEUDO` instruction instead of `REG_SEQUENCE`.
For the multi-vector load case that you're trying to improve, the inputs to the tuple are always COPY nodes of the form:
```
%9:zpr = COPY %7.zsub0:zpr2stridedorcontiguous
```
There are cases where the RegisterCoalescer can make better decisions when using regular COPY nodes rather than the FORM_STRIDED_TUPLE pseudos. We could choose to handle the FORM_STRIDED_TUPLE pseudo with the `hasPostISelHook = 1` where directly post-isel they are transformed into a REG_SEQUENCE node when any of the input values are not COPY nodes where the source register is in a 'stridedorcontiguous' register class. The REG_SEQUENCE node itself is then lowered later by the TwoAddressInstructionPass into individual COPY nodes.
https://github.com/llvm/llvm-project/pull/116399
More information about the llvm-commits
mailing list