[Mlir-commits] [mlir] [mlir][ArmSME] Lower vector.outerproduct to FMOPA/BFMOPA (PR #65621)

Tue Sep 12 04:39:19 PDT 2023

================
@@ -361,6 +361,111 @@ struct MoveVectorToTileSliceToArmSMELowering
   }
 };
 
+/// Lower `vector.outerproduct` to SME MOPA intrinsics.
+///
+/// Example:
+///
+///   %0 = vector.outerproduct %lhs, %rhs, %acc {kind = #vector.kind<add>}
+///     : vector<[4]xf32>, vector<[4]xf32>
+///
+/// is converted to:
+///
+///   "arm_sme.intr.mopa"(%tile_id, %ptrue_s, %ptrue_s, %lhs, %rhs)
----------------
banach-space wrote:

Given the expected complexity with generating correct masks, I am also leaning towards a custom op. Having said that, IMHO this PR is fine as is and we could iterate in the follow-up patches. 

> 2. In two steps, we pass the single mask in the masked vector outerproduct operation to both operands and later run a pass that replace this mask with the two masks from the operands, again.

I guess that for this to work, we'd need something like `
```
 %res = arm_sme.op %rhs, %lhs <optional_mask_for_rhs_or_result> <optional_mask_for_lhs>
```
So, we'd allow 2 optional masks, both of which would be optional:
* if only 1 mask is specified then this is a mask for the result (1 x 2D),
* if 2 masks are specified then these are for the input vectors 2 x 1D),
* if no masks are specified, then use `ptrue` (all lanes are active).

WDYT?

https://github.com/llvm/llvm-project/pull/65621