[Mlir-commits] [mlir] [mlir][ArmSME] Lower vector.outerproduct to FMOPA/BFMOPA (PR #65621)

Sun Sep 10 23:14:55 PDT 2023

================
@@ -361,6 +361,111 @@ struct MoveVectorToTileSliceToArmSMELowering
   }
 };
 
+/// Lower `vector.outerproduct` to SME MOPA intrinsics.
+///
+/// Example:
+///
+///   %0 = vector.outerproduct %lhs, %rhs, %acc {kind = #vector.kind<add>}
+///     : vector<[4]xf32>, vector<[4]xf32>
+///
+/// is converted to:
+///
+///   "arm_sme.intr.mopa"(%tile_id, %ptrue_s, %ptrue_s, %lhs, %rhs)
----------------
dcaballe wrote:

This intrinsic takes two operands to mask the inputs in the op itself so to support masking you would have to propagate the mask from the producer ops... That's interesting because it looks like the op knows how to merge both masks without requiring independent mask manipulation operations. 

How do we plan to implement proper support for this? I see two options:
  1. In one shot, we search for the two masks in the use-def chain and use them directly in the intrinsic. If there is any mask manipulation operation in-between, it should become dead, hopefully, and go away.
  2. In two steps, we pass the single mask in the masked vector outerproduct operation to both operands and later run a pass that replace this mask with the two masks from the operands, again.

I think doing all of that as part of the lowering (1) might be too much for a lowering, esp. if finding the masks through the use-def chain is not trivial. (2) seems simpler to me but I wouldn't implement that on top of an llvm intrinsic. I think for that we should have a proper sme op, which should be fine now that we have the sme dialect.

Thoughts?

https://github.com/llvm/llvm-project/pull/65621