[Mlir-commits] [mlir] [mlir][vector] Fix incorrect byte-alignment assumption in ConvertVectorStore (PR #189235)
Mehdi Amini
llvmlistbot at llvm.org
Sun Mar 29 06:01:53 PDT 2026
https://github.com/joker-eph created https://github.com/llvm/llvm-project/pull/189235
When `ConvertVectorStore` emits the narrow-type emulation for a `vector.store` into a 2-D memref, it previously assumed that if the trailing dimension of the memref exactly matches the vector size (`trailingDimsMatch`), then the last-dimension index must be zero and no sub-byte alignment adjustment is needed. This assumption is wrong: a valid store such as
vector.store %v, %src[%c0, %c1] : memref<3x4xi2>, vector<4xi2>
has a non-zero column index (%c1 == 1) even though trailingDim (4) equals the vector size (4). The incorrect shortcut caused the pattern to fall into the "aligned" path and emit a plain bitcast + store at byte offset 0, silently dropping elements [1], [2], [3] of the first byte and overwriting the wrong memory.
Fix: prefer `linearizedInfo.intraDataOffset` (which gives the exact sub-element offset for any constant-index store) and only fall back to the old `0` assumption when the indices are fully dynamic (i.e., `intraDataOffset` cannot be folded to a constant) **and** `isDivisibleInSize && trailingDimsMatch` still holds. This preserves the existing behaviour for dynamic-index stores while fixing the constant-index case.
Fixes #131528
Assisted-by: Claude Code
>From 32b2fcb2ddf6ba7fea559813f816e98854e1c970 Mon Sep 17 00:00:00 2001
From: Mehdi Amini <joker.eph at gmail.com>
Date: Sun, 29 Mar 2026 05:17:59 -0700
Subject: [PATCH] [mlir][vector] Fix incorrect byte-alignment assumption in
ConvertVectorStore
When `ConvertVectorStore` emits the narrow-type emulation for a
`vector.store` into a 2-D memref, it previously assumed that if the
trailing dimension of the memref exactly matches the vector size
(`trailingDimsMatch`), then the last-dimension index must be zero and
no sub-byte alignment adjustment is needed. This assumption is wrong:
a valid store such as
vector.store %v, %src[%c0, %c1] : memref<3x4xi2>, vector<4xi2>
has a non-zero column index (%c1 == 1) even though trailingDim (4)
equals the vector size (4). The incorrect shortcut caused the pattern
to fall into the "aligned" path and emit a plain bitcast + store at
byte offset 0, silently dropping elements [1], [2], [3] of the first
byte and overwriting the wrong memory.
Fix: prefer `linearizedInfo.intraDataOffset` (which gives the exact
sub-element offset for any constant-index store) and only fall back to
the old `0` assumption when the indices are fully dynamic (i.e.,
`intraDataOffset` cannot be folded to a constant) **and**
`isDivisibleInSize && trailingDimsMatch` still holds. This preserves
the existing behaviour for dynamic-index stores while fixing the
constant-index case.
A regression test with `memref<3x4xi2>` / `vector<4xi2>` at `[0, 1]`
is added to `vector-emulate-narrow-type-unaligned.mlir`; the fixed
output emits two `memref.generic_atomic_rmw` operations covering the
byte boundary correctly.
Fixes https://github.com/llvm/llvm-project/issues/131528
Assisted-by: Claude Code
---
.../Transforms/VectorEmulateNarrowType.cpp | 26 ++++++++++++-------
.../vector-emulate-narrow-type-unaligned.mlir | 24 +++++++++++++++++
2 files changed, 41 insertions(+), 9 deletions(-)
diff --git a/mlir/lib/Dialect/Vector/Transforms/VectorEmulateNarrowType.cpp b/mlir/lib/Dialect/Vector/Transforms/VectorEmulateNarrowType.cpp
index 583cda7ac2810..60161cdf15dfd 100644
--- a/mlir/lib/Dialect/Vector/Transforms/VectorEmulateNarrowType.cpp
+++ b/mlir/lib/Dialect/Vector/Transforms/VectorEmulateNarrowType.cpp
@@ -635,10 +635,13 @@ struct ConvertVectorStore final : OpConversionPattern<vector::StoreOp> {
return success();
}
- // Do the trailing dim for source and destination match? If yes, then the
- // corresponding index must be 0.
- // FIXME: There's no way to tell for dynamic shapes, so we should bail out.
- // However, that makes some tests fail, so we need to audit first.
+ // Do the trailing dim for source and destination match? If yes, and if the
+ // access indices are not all constant, then assume the last index is 0
+ // (byte-aligned). Note: for constant indices, the intraDataOffset computed
+ // below will give the exact value, so the trailingDimsMatch shortcut is
+ // not used in that case.
+ // FIXME: For dynamic indices where trailingDimsMatch, the assumption that
+ // the last index is 0 (byte-aligned) may be incorrect. See issue #131528.
auto trailingDim = op.getBase().getType().getShape().back();
bool trailingDimsMatch =
ShapedType::isDynamic(trailingDim) || trailingDim == origElements;
@@ -646,8 +649,6 @@ struct ConvertVectorStore final : OpConversionPattern<vector::StoreOp> {
auto stridedMetadata =
memref::ExtractStridedMetadataOp::create(rewriter, loc, op.getBase());
- // FIXME: ATM, we do not test cases where offsets, sizes, or strides are
- // non-zero. As such, this is not needed.
OpFoldResult linearizedIndices;
memref::LinearizedMemRefInfo linearizedInfo;
std::tie(linearizedInfo, linearizedIndices) =
@@ -658,10 +659,17 @@ struct ConvertVectorStore final : OpConversionPattern<vector::StoreOp> {
stridedMetadata.getConstifiedMixedStrides(),
getAsOpFoldResult(adaptor.getIndices()));
+ // Prefer the exact intraDataOffset when it can be folded (e.g. all-constant
+ // indices). Fall back to 0 only when the trailing dimension exactly matches
+ // the vector size (trailingDimsMatch), because in that case a dynamic last
+ // index implies byte-alignment (the caller is responsible for passing a
+ // valid, aligned index). If neither condition holds, bail out.
std::optional<int64_t> foldedNumFrontPadElems =
- (isDivisibleInSize && trailingDimsMatch)
- ? 0
- : getConstantIntValue(linearizedInfo.intraDataOffset);
+ getConstantIntValue(linearizedInfo.intraDataOffset);
+ if (!foldedNumFrontPadElems) {
+ if (isDivisibleInSize && trailingDimsMatch)
+ foldedNumFrontPadElems = 0;
+ }
if (!foldedNumFrontPadElems) {
return rewriter.notifyMatchFailure(
diff --git a/mlir/test/Dialect/Vector/vector-emulate-narrow-type-unaligned.mlir b/mlir/test/Dialect/Vector/vector-emulate-narrow-type-unaligned.mlir
index 21f073efc49b2..bec7736b90973 100644
--- a/mlir/test/Dialect/Vector/vector-emulate-narrow-type-unaligned.mlir
+++ b/mlir/test/Dialect/Vector/vector-emulate-narrow-type-unaligned.mlir
@@ -562,3 +562,27 @@ func.func @vector_store_i2_const_index_one_partial_store(%arg0: vector<1xi2>) {
// CHECK: %[[BITCAST2:.+]] = vector.bitcast %[[SELECT]] : vector<4xi2> to vector<1xi8>
// CHECK: %[[EXTRACT2:.+]] = vector.extract %[[BITCAST2]][0] : i8 from vector<1xi8>
// CHECK: memref.atomic_yield %[[EXTRACT2]] : i8
+
+// -----
+
+// Regression test for https://github.com/llvm/llvm-project/issues/131528.
+// A vector.store with a non-zero constant column index on a 2D memref where the
+// trailing dimension matches the vector size must NOT be treated as
+// byte-aligned. Instead it must emit two partial (RMW) stores since the
+// 4-element i2 vector starting at column 1 crosses a byte boundary.
+func.func @vector_store_i2_2d_const_nonzero_col(%arg0: vector<4xi2>) {
+ %src = memref.alloc() : memref<3x4xi2>
+ %c0 = arith.constant 0 : index
+ %c1 = arith.constant 1 : index
+ vector.store %arg0, %src[%c0, %c1] : memref<3x4xi2>, vector<4xi2>
+ return
+}
+
+// CHECK-LABEL: func @vector_store_i2_2d_const_nonzero_col(
+// CHECK-SAME: %[[ARG0:.+]]: vector<4xi2>)
+// CHECK: %[[ALLOC:.+]] = memref.alloc() : memref<3xi8>
+// CHECK: %[[C0:.+]] = arith.constant 0 : index
+// Emits two partial atomic RMWs: one for byte 0 (elements at positions [1..3])
+// and one for byte 1 (element at position [0]).
+// CHECK: memref.generic_atomic_rmw %[[ALLOC]][%[[C0]]]
+// CHECK: memref.generic_atomic_rmw %[[ALLOC]][{{.+}}]
More information about the Mlir-commits
mailing list