[Mlir-commits] [mlir] [mlir][vector] Document `ConvertVectorStore` + unify var names (nfc) (PR #126422)
Andrzej Warzyński
llvmlistbot at llvm.org
Fri Feb 14 01:34:16 PST 2025
================
@@ -432,7 +432,86 @@ namespace {
// ConvertVectorStore
//===----------------------------------------------------------------------===//
-// TODO: Document-me
+// Emulate vector.store using a multi-byte container type
+//
+// The container type is obtained through Op adaptor and would normally be
+// generated via `NarrowTypeEmulationConverter`.
+//
+// EXAMPLE 1
+// (aligned store of i4, emulated using i8)
+//
+// vector.store %src, %dest[%idx_1, %idx_2] : memref<4x8xi4>, vector<8xi4>
+//
+// is rewritten as:
+//
+// %src_bitcast = vector.bitcast %src : vector<8xi4> to vector<4xi8>
+// vector.store %src_bitcast, %dest_bitcast[%idx]
+// : memref<16xi8>, vector<4xi8>
+//
+// EXAMPLE 2
+// (unaligned store of i2, emulated using i8, non-atomic)
+//
+// vector.store %src, %dest[%c2, %c0] :memref<3x3xi2>, vector<3xi2>
+//
+// The i2 store is emulated through 2 x RMW sequences. The destination i2 memref
+// is modelled using 3 bytes:
+//
+// Byte 0 Byte 1 Byte 2
+// +----------+----------+----------+
+// | oooooooo | ooooNNNN | NNoooooo |
+// +----------+----------+----------+
+//
+// N - (N)ew entries (i.e. to be overwritten by vector.store)
+// o - (o)ld entries (to be preserved)
+//
+// The following 2 RMW sequences will be generated:
+//
+// %init = arith.constant dense<0> : vector<4xi2>
+//
+// (RMW sequence for Byte 1)
+// (Mask for 4 x i2 elements, i.e. a byte)
+// %mask_1 = arith.constant dense<[false, false, true, true]>
+// %src_slice_1 = vector.extract_strided_slice %src
+// {offsets = [0], sizes = [2], strides = [1]}
+// : vector<3xi2> to vector<2xi2>
+// %init_with_slice_1 = vector.insert_strided_slice %src_slice_1, %init
+// {offsets = [2], strides = [1]}
+// : vector<2xi2> into vector<4xi2>
+// %dest_byte_1 = vector.load %dest[%c1]
+// %dest_byte_1_as_i2 = vector.bitcast %dest_byte_1
+// : vector<1xi8> to vector<4xi2>
+// %res_byte_1 = arith.select %mask_1, %init_with_slice_1, %dest_byte_1_as_i2
+// %res_byte_1_as_i8 = vector.bitcast %res_byte_1
+// vector.store %res_byte_1_as_i8, %dest[1]
+
+// (RMW sequence for Byte 22)
+// (Mask for 4 x i2 elements, i.e. a byte)
+// %mask_2 = arith.constant dense<[true, false, false, false]>
+// %src_slice_2 = vector.extract_strided_slice %src
+// : {offsets = [2], sizes = [1], strides = [1]}
+// : vector<3xi2> to vector<1xi2>
+// %initi_with_slice_2 = vector.insert_strided_slice %src_slice_2, %init
+// : {offsets = [0], strides = [1]}
+// : vector<1xi2> into vector<4xi2>
+// %dest_byte_2 = vector.load %dest[%c2]
----------------
banach-space wrote:
Thanks for the suggestion! I was wondering how to avoid this long comment and your suggestion is exactly what we should be doing! 🙏🏻
As this example is taken from "vector-emulate-narrow-type-unaligned-non-atomic.mlir", that's the test file that I've updated to help here. Please check the latest update.
Note, I've made quite a few changes:
* Extended comments.
* Fix `DOWNCAST` vs `UPCAST`.
* Renamed some variables to avoid generic names (e.g. `%arg0` -> `%src`, `%0` -> `%dest`).
* Added more `CHECK-LINES`, e.g. `// CHECK-SAME: : vector<1xi8> to vector<4xi2>` to make sure that the right casting is generated.
* Followed formatting style from [vectorize-convolution.mlir](https://github.com/llvm/llvm-project/blob/main/mlir/test/Dialect/Linalg/vectorize-convolution.mlir). IMHO it's a very "readable" style that's particularly handy for complex tests like these ones.
I appreciate that these are quite intrusive changes, but since it's meant as documentation, it felt like the right thing to do. But I am happy to adapt/revert if you feel that this is too much.
Thanks for reviewing!
https://github.com/llvm/llvm-project/pull/126422
More information about the Mlir-commits
mailing list