[Mlir-commits] [mlir] [mlir][ArmSME] Lower multi-tile stores to a single loop (PR #96187)
Andrzej Warzyński
llvmlistbot at llvm.org
Fri Jun 21 08:23:04 PDT 2024
================
@@ -373,6 +374,130 @@ struct LegalizeTransferWriteOpsByDecomposition
}
};
+/// Legalize a multi-tile transfer_write as a single store loop. This is done as
+/// part of type decomposition as at this level we know each tile write is
+/// disjoint, but that information is lost after decomposition (without
+/// static analysis).
+///
+/// Example (in pseudo-MLIR):
+///
+/// ```
+/// vector.transfer_write vector, dest[x, y], mask
+/// : vector<[16]x[4]xf32>, memref<?x?xf32>
+/// ```
+/// Is rewritten to:
+/// ```
+/// for i in range (0, 4 * vscale) {
+/// let sliceRow = i + tile_n.row * vscale; ─┐
+/// let sliceCol = tile_n.col * vscale; |
+/// slice = vector.extract tile_n[i] |
+/// : vector<[4]xf32> from vector<[16]x[4]xf32> |
+/// slice_mask = vector.extract mask[sliceRow] |- Repeated 4x for
+/// : vector<[4]xi1> from vector<[16]x[4]xi1> | all tiles in
+/// vector.transfer_write | [16]x[4]
+/// slice, dest[x + sliceRow, y + sliceCol], slice_mask |
+/// : vector<[4]xf32>, memref<?x?xf32> ┘
+/// }
----------------
banach-space wrote:
I'm finding this rather tricky to follow 😅 I think that it would be easier if you:
* used e.g. `i16` instead of `f32` (so that there are 2 tiles)
* presented a full example rather than typing `Repeated 4x for all tiles`
Let me also share some more specific suggestions:
* `for i in range ()` -> `for %row_idx in range()` (i.e. avoid enigmatic `i`)
* `tile_n` -> `src_tile`? (what's `_n` meant to represent?)
* what's `tile_n.col` and `tile_n.row`?
IIUC, for `[16] x [4]` there are 4 vertical tiles and `tile_n.col` would always be 0?
https://github.com/llvm/llvm-project/pull/96187
More information about the Mlir-commits
mailing list