[Mlir-commits] [mlir] [mlir][vector] Document `ConvertVectorStore` + unify var names (nfc) (PR #126422)
Alan Li
llvmlistbot at llvm.org
Thu Feb 13 06:55:33 PST 2025
================
@@ -432,7 +432,86 @@ namespace {
// ConvertVectorStore
//===----------------------------------------------------------------------===//
-// TODO: Document-me
+// Emulate vector.store using a multi-byte container type
+//
+// The container type is obtained through Op adaptor and would normally be
+// generated via `NarrowTypeEmulationConverter`.
+//
+// EXAMPLE 1
+// (aligned store of i4, emulated using i8)
+//
+// vector.store %src, %dest[%idx_1, %idx_2] : memref<4x8xi4>, vector<8xi4>
+//
+// is rewritten as:
+//
+// %src_bitcast = vector.bitcast %src : vector<8xi4> to vector<4xi8>
+// vector.store %src_bitcast, %dest_bitcast[%idx]
+// : memref<16xi8>, vector<4xi8>
+//
+// EXAMPLE 2
+// (unaligned store of i2, emulated using i8, non-atomic)
+//
+// vector.store %src, %dest[%c2, %c0] :memref<3x3xi2>, vector<3xi2>
+//
+// The i2 store is emulated through 2 x RMW sequences. The destination i2 memref
+// is modelled using 3 bytes:
+//
+// Byte 0 Byte 1 Byte 2
+// +----------+----------+----------+
+// | oooooooo | ooooNNNN | NNoooooo |
+// +----------+----------+----------+
+//
+// N - (N)ew entries (i.e. to be overwritten by vector.store)
+// o - (o)ld entries (to be preserved)
+//
+// The following 2 RMW sequences will be generated:
+//
+// %init = arith.constant dense<0> : vector<4xi2>
+//
+// (RMW sequence for Byte 1)
+// (Mask for 4 x i2 elements, i.e. a byte)
+// %mask_1 = arith.constant dense<[false, false, true, true]>
+// %src_slice_1 = vector.extract_strided_slice %src
+// {offsets = [0], sizes = [2], strides = [1]}
+// : vector<3xi2> to vector<2xi2>
+// %init_with_slice_1 = vector.insert_strided_slice %src_slice_1, %init
+// {offsets = [2], strides = [1]}
+// : vector<2xi2> into vector<4xi2>
+// %dest_byte_1 = vector.load %dest[%c1]
+// %dest_byte_1_as_i2 = vector.bitcast %dest_byte_1
+// : vector<1xi8> to vector<4xi2>
+// %res_byte_1 = arith.select %mask_1, %init_with_slice_1, %dest_byte_1_as_i2
+// %res_byte_1_as_i8 = vector.bitcast %res_byte_1
+// vector.store %res_byte_1_as_i8, %dest[1]
+
+// (RMW sequence for Byte 22)
+// (Mask for 4 x i2 elements, i.e. a byte)
+// %mask_2 = arith.constant dense<[true, false, false, false]>
+// %src_slice_2 = vector.extract_strided_slice %src
+// : {offsets = [2], sizes = [1], strides = [1]}
+// : vector<3xi2> to vector<1xi2>
+// %initi_with_slice_2 = vector.insert_strided_slice %src_slice_2, %init
+// : {offsets = [0], strides = [1]}
+// : vector<1xi2> into vector<4xi2>
+// %dest_byte_2 = vector.load %dest[%c2]
----------------
lialan wrote:
I think we can refer the reader to take a look at the corresponding test case, and we try to annotate/comment more precisely in the best case instead.
https://github.com/llvm/llvm-project/pull/126422
More information about the Mlir-commits
mailing list