[clang] [llvm] [AArch64][clang][llvm] Add structured sparsity outer product (TMOP) intrinsics (PR #135145)

Thu Apr 10 10:45:26 PDT 2025

================
@@ -3593,6 +3578,25 @@ class sme_tmopa_32b<bits<5> opc, RegisterOperand zn_ty, RegisterOperand zm_ty, s
   let Constraints = "$ZAda = $_ZAda";
 }
 
+multiclass sme_tmopa_16b<bits<5> opc, RegisterOperand zn_ty, RegisterOperand zm_ty, ValueType vt, string mnemonic, string intrinsic> {
+  def NAME : sme_int_sparse_outer_product_i16<opc, zn_ty, zm_ty, mnemonic>, SMEPseudo2Instr<NAME, 1> {
+     let Uses = [FPMR, FPCR];
+  }
+
+  def NAME # _PSEUDO : sme_sparse_outer_product_pseudo<zn_ty, zm_ty, SMEMatrixTileH>, SMEPseudo2Instr<NAME, 0>;
+
+  def _ : SME2_ZA_TMOP_Pat<NAME, !cast<SDPatternOperator>(intrinsic), timm32_0_3, vt>;
----------------
CarolineConcatto wrote:

Can we replace 'def _' by 'def'

So TileOp16:$ZAda and VectorIndexS32b:$imm have different limits
and in the Pattern we are passing them as the same value:

def _ : SME2_ZA_TMOP_Pat<NAME, !cast<SDPatternOperator>(intrinsic), **timm32_0_3,** vt>;
I think we should add another parameter in the function one for Zda and another for index. 
Index timm32_0_3
tile  : timm32_0_1

https://developer.arm.com/documentation/ddi0602/2025-03/SME-Instructions/FTMOPA--widening--2-way--FP8-to-FP16---8-bit-floating-point-sparse-sum-of-two-outer-products--accumulating-
https://developer.arm.com/documentation/ddi0602/2025-03/SME-Instructions/FTMOPA--non-widening---Floating-point-sparse-outer-product--accumulating-

https://github.com/llvm/llvm-project/pull/135145