[Mlir-commits] [mlir] [mlir][linalg] Refactor vectorization hooks to improve code reuse (PR #141244)
Andrzej Warzyński
llvmlistbot at llvm.org
Fri May 30 03:17:54 PDT 2025
https://github.com/banach-space updated https://github.com/llvm/llvm-project/pull/141244
>From 0c34fbbd8f26bb72d18264dd9ba578157b25d3e4 Mon Sep 17 00:00:00 2001
From: Andrzej Warzynski <andrzej.warzynski at arm.com>
Date: Fri, 2 May 2025 08:42:04 +0100
Subject: [PATCH 1/3] [mlir][linalg] Refactor vectorization hooks to improve
code reuse
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This patch refactors two vectorization hooks in Vectorization.cpp:
* `createWriteOrMaskedWrite` gains a new parameter for write indices,
aligning it with its counterpart `createReadOrMaskedRead`.
* `vectorizeAsInsertSliceOp` is updated to reuse both of the above
hooks, rather than re-implementing similar logic.
CONTEXT
-------
This is effectively a refactoring of the logic for vectorizing
`tensor.insert_slice`. Recent updates added masking support:
* https://github.com/llvm/llvm-project/pull/122927
* https://github.com/llvm/llvm-project/pull/123031
At the time, reuse of the shared `create*` hooks wasn't feasible due to
missing parameters and overly rigid assumptions. This patch resolves
that and moves us closer to a more maintainable structure.
CHANGES IN `vectorizeAsInsertSliceOp`
-------------------------------------
* Introduces a clear distinction between the destination tensor and the
vector to store, via named variables like `destType`/`vecToStoreType`,
`destShape`/`vecToStoreShape`, etc.
* Ensures the correct rank and shape are used for attributes like
in_bounds. For example, the size of the in_bounds array now matches
the source vector rank, not the tensor rank.
* Drops the assumption that `vecToStoreRank == destRank` — this doesn't
hold in many real examples.
* Deduces mask dimensions from `vecToStoreShape` (vector) instead of
`destShape` (tensor). (Eventually we should not require
`inputVecSizesForLeadingDims` at all — mask shape should be inferred.)
NEW HELPER: `isMaskTriviallyFoldable`
-------------------------------------
Adds a utility to detect when masking is unnecessary. This avoids
inserting redundant masks and reduces the burden on canonicalization to
clean them up later.
Example where masking is provably unnecessary:
```mlir
%2 = vector.mask %1 {
vector.transfer_write %0, %arg1[%c0, %c0, %c0, %c0, %c0, %c0]
{in_bounds = [true, true, true]}
: vector<1x2x3xf32>, tensor<9x8x7x1x2x3xf32>
} : vector<1x2x3xi1> -> tensor<9x8x7x1x2x3xf32>
```
Also, without this hook, tests are more complicated and require more
matching.
TEST CHANGES
-----------
This patch primarily affects vectorization of:
* `tensor.insert_slice`, now refactored to use shared hooks.
`tensor.pad` vectorization patterns, which internally use
`tensor.insert_slice`, are also _effectively_ updated. Note, only
pad-with-patterns.mlir is affected.
Most test updates involve the insertion of masks that were previously
missing — this reflects a correctness fix, not a regression. In all
cases, the added masks are indeed required.
You’ll also notice more repeated constants (`arith.constant 0 : index`),
due to increased use of helper hooks. This will be cleaned up separately
via a constant cache (see #138265 for discussion).
NOTE FOR REVIEWERS
------------------
This is a fairly substantial rewrite. You may find it easier to review
`createWriteOrMaskedWrite` as a new method rather than diffing
line-by-line.
TODOs (future PRs)
------------------
Further alignment of `createWriteOrMaskedWrite` and
`createReadOrMaskedRead`:
* Move `createWriteOrMaskedWrite` next to `createReadOrMaskedRead` (in
VectorUtils.cpp)
* Make `createReadOrMaskedRead` leverage `isMaskTriviallyFoldable`.
* Extend `isMaskTriviallyFoldable` with value-bounds-analysis. See the
updated test in transform-vector.mlir for an example that would
benefit from this.
(* This method will eventually be moved out of Vectorization.cpp, which isn't the right long-term home for it.)
---
.../Linalg/Transforms/Vectorization.cpp | 274 ++++++++++++------
mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp | 4 +-
mlir/test/Dialect/LLVM/transform-e2e.mlir | 10 +-
.../insert-slice-with-patterns.mlir | 11 +-
.../Linalg/vectorization/insert-slice.mlir | 81 ++++--
.../Linalg/vectorization/linalg-ops.mlir | 1 -
.../vectorization/pad-with-patterns.mlir | 27 +-
.../test/Dialect/Vector/transform-vector.mlir | 40 ++-
8 files changed, 299 insertions(+), 149 deletions(-)
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
index c5b62227777a7..71ffebc7c518c 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
@@ -1506,20 +1506,120 @@ static SmallVector<int64_t> getTiledPackShape(linalg::PackOp packOp,
return applyPermutation(destShape, linalg::getPackInverseDestPerm(packOp));
}
+/// Determines whether a mask for xfer_write is trivially "all true"
+///
+/// Given all the inputs required to generate a mask (mask sizes and shapes),
+/// and an xfer_write operation (write indices and the destination tensor
+/// shape), determines whether the corresponding mask would be trivially
+/// foldable (i.e., trivially "all true").
+///
+/// Use this method to avoid generating spurious masks and relaying on
+/// vectorization post-processing to remove them.
+///
+/// Pre-conditions for a mask to be trivially foldable:
+/// * All involved shapes (mask + destination tensor) are static.
+/// * All write indices are constant.
+/// * All mask sizes are constant (including `arith.constant`).
+///
+/// If the pre-conditions are met, the method checks for each destination
+/// dimension `d`:
+/// (1) destDimSize[rankDiff + d] <= maskShape[d]
+/// (2) destDimSize[rankDiff + d] <= writeIndex[d] + maskSize[d]
+///
+/// rankDiff = rank(dest) - rank(mask).
+///
+/// This method takes a conservative view: it may return false even if the mask
+/// is technically foldable.
+///
+/// EXAMPLE 1 (trivially foldable, all shapes match, mask sizes match the shape
+/// of the dest tensor):
+/// %c0 = arith.constant 0 : index
+/// %mask = vector.create_mask 5, 1
+/// vector.mask %mask {
+/// vector.transfer_write %vecToStore_1, %dest{[%c0, %c0]
+/// {in_bounds = [true, true]}
+/// : vector<5x1xi32>, tensor<5x1xi32>
+/// }
+///
+/// EXAMPLE 2 (not trivially foldable - vector shape exceeds the tensor shape,
+/// mask is required to avoid out-of-bounds write):
+/// %c0 = arith.constant 0 : index
+/// %mask = vector.create_mask 5, 1
+/// vector.mask %mask {
+/// vector.transfer_write %vecToStore_2, %dest[%c0, %c0]
+/// {in_bounds = [true, true]}
+/// : vector<8x1xi32>, tensor<5x1xi32>
+/// }
+///
+/// TODO: Re-use in createReadOrMaskedRead
+static bool isMaskTriviallyFoldable(SmallVector<OpFoldResult> &maskSizes,
+ SmallVector<Value> &writeIdxs,
+ ArrayRef<int64_t> destShape,
+ ArrayRef<int64_t> maskShape) {
+ // Masking is unavoidable in the case of dynamic tensors.
+ if (ShapedType::isDynamicShape(destShape))
+ return false;
+
+ // Collect all constant mask sizes.
+ SmallVector<int64_t, 4> cstMaskSizes;
+ for (auto [i, dimSize] : llvm::enumerate(maskSizes)) {
+ if (auto intSize = getConstantIntValue(dimSize)) {
+ cstMaskSizes.push_back(*intSize);
+ }
+ }
+
+ // If any of the mask sizes is non-constant, bail out.
+ if (cstMaskSizes.size() != maskShape.size())
+ return false;
+
+ // Collect all constant write indices.
+ SmallVector<int64_t, 4> cstWriteIdxs;
+ for (auto [i, idx] : llvm::enumerate(writeIdxs)) {
+ APSInt intVal;
+ if (matchPattern(idx, m_ConstantInt(&intVal))) {
+ cstWriteIdxs.push_back(intVal.getSExtValue());
+ }
+ }
+
+ // If any of the write indices is non-constant, bail out.
+ if (cstWriteIdxs.size() != destShape.size())
+ return false;
+
+ // Go over all destination dims and check (1) and (2). Take into account that:
+ // * The number of mask sizes will match the rank of the vector to store.
+ // This could be lower than the rank of the destination tensor.
+ // * Mask sizes could be larger than the corresponding mask shape (hence
+ // `clamp`).
+ // TODO: The 2nd item should be rejected by the verifier.
+ int64_t rankDiff = destShape.size() - cstMaskSizes.size();
+ for (auto [i, idx] : llvm::enumerate(cstMaskSizes)) {
+ if (/*(1)*/ maskShape[i] > destShape[rankDiff + i] ||
+ /*(2)*/ destShape[rankDiff + i] <
+ (std::clamp(cstMaskSizes[i], int64_t(0), maskShape[i]) +
+ cstWriteIdxs[i]))
+ return false;
+ }
+
+ return true;
+}
+
/// Creates an optionally masked TransferWriteOp
///
/// Generates the following operation:
/// %res = vector.transfer_write %vectorToStore into %dest
///
-/// If the leading N dimensions of the destination tensor do not match
+/// If the leading N dimensions of the vector to store do not match
/// `inputVecSizesForLeadingDims` (N = rank(inputVecSizesForLeadingDims)),
/// masking is applied to ensure correctness:
///
-/// %mask = vector.create_mask(%destShape)
+/// %mask = vector.create_mask(%destShape) : %vectorToStoreShape
/// %res = vector.mask %mask {
/// vector.transfer_write %vectorToStore into %dest
/// }
///
+/// The mask shape is identical to `vectorToStore` (with the element type ==
+/// i1), and the mask values are based on the shape of the `dest` tensor.
+///
/// If `useInBoundsInsteadOfMasking` is set to `true`, the `in_bounds` attribute
/// is used instead of masking:
///
@@ -1528,75 +1628,99 @@ static SmallVector<int64_t> getTiledPackShape(linalg::PackOp packOp,
/// %res = vector.transfer_write %input into %dest
/// {in_bounds = in_bounds_flags}
///
-/// NOTE: All write offsets are set to 0.
-/// TODO: Allow specyfying write offsets.
-/// NOTE: When N < rank(input), the missing vector sizes are effectively
-/// extracted from the trailing sizes of `destSizes`. This means those sizes
-/// must be static.
-/// TODO: Support cases where an arbitrary dim is dynamic - this will require
-/// specifying all the vector sizes.
+/// `writeIndices` specifies the offsets to use. If empty, all indices are set
+/// to 0.
+///
+/// NOTE: When N < rank(vectorToStore), the missing vector sizes are taken from
+/// `valueToStore`.
+/// TODO: `inputVecSizesForLeadingDims` should not be required - these sizes are
+/// already provided in `vectorToStore`.
static Operation *
createWriteOrMaskedWrite(OpBuilder &builder, Location loc, Value vectorToStore,
Value dest,
ArrayRef<int64_t> inputVecSizesForLeadingDims,
+ SmallVector<Value> writeIndices = {},
bool useInBoundsInsteadOfMasking = false) {
ShapedType destType = cast<ShapedType>(dest.getType());
- assert(cast<VectorType>(vectorToStore.getType()).getRank() ==
- static_cast<int64_t>(destType.getRank()) &&
- "Rank mismatch!");
- (void)destType;
+ int64_t destRank = destType.getRank();
+ auto destShape = destType.getShape();
- int64_t rank = cast<ShapedType>(dest.getType()).getRank();
- auto destShape = cast<ShapedType>(dest.getType()).getShape();
+ VectorType vecToStoreType = cast<VectorType>(vectorToStore.getType());
+ int64_t vecToStoreRank = vecToStoreType.getRank();
+ auto vecToStoreShape = vecToStoreType.getShape();
// Compute the in_bounds attribute
- SmallVector<bool> inBoundsVal(rank, true);
+ SmallVector<bool> inBoundsVal(vecToStoreRank, true);
if (useInBoundsInsteadOfMasking) {
// In this case, assume that all the required vector sizes have been
// provided.
assert(inputVecSizesForLeadingDims.size() ==
- static_cast<size_t>(destType.getRank()) &&
+ static_cast<size_t>(vecToStoreType.getRank()) &&
"Insufficient number of input vector sizes!");
// Update the inBounds attribute.
- for (unsigned i = 0; i < rank; i++)
+ for (unsigned i = 0; i < destRank; i++)
inBoundsVal[i] = (destShape[i] == inputVecSizesForLeadingDims[i]) &&
!ShapedType::isDynamic(destShape[i]);
}
+ // If missing, initialize the write indices to 0.
+ assert(writeIndices.empty() ||
+ writeIndices.size() == static_cast<size_t>(destRank) &&
+ "Invalid number of write indices!");
+ if (writeIndices.empty()) {
+ auto zero = builder.create<arith::ConstantIndexOp>(loc, 0);
+ writeIndices = SmallVector<Value>(destRank, zero);
+ }
+
// Generate the xfer_write Op
- auto zero = builder.create<arith::ConstantIndexOp>(loc, 0);
- Operation *write = builder.create<vector::TransferWriteOp>(
- loc,
- /*vector=*/vectorToStore,
- /*source=*/dest,
- /*indices=*/SmallVector<Value>(rank, zero),
- /*inBounds=*/inBoundsVal);
- assert(llvm::none_of(
- destShape.drop_front(inputVecSizesForLeadingDims.size()),
- [](int64_t size) { return size == ShapedType::kDynamic; }) &&
- "Only dims aligned with inputVecSizesForLeadingDims may be dynamic");
+ Operation *write =
+ builder.create<vector::TransferWriteOp>(loc,
+ /*vector=*/vectorToStore,
+ /*source=*/dest,
+ /*indices=*/writeIndices,
+ /*inBounds=*/inBoundsVal);
// If masking is disabled, exit.
if (useInBoundsInsteadOfMasking)
return write;
+ assert(llvm::none_of(
+ destShape.drop_front(inputVecSizesForLeadingDims.size()),
+ [](int64_t size) { return size == ShapedType::kDynamic; }) &&
+ "Only dims aligned with inputVecSizesForLeadingDims may be dynamic");
+
// Check if masking is needed.
bool needMaskForWrite =
!llvm::equal(inputVecSizesForLeadingDims,
- destShape.take_front(inputVecSizesForLeadingDims.size()));
+ destShape.take_front(destRank - vecToStoreRank +
+ inputVecSizesForLeadingDims.size()));
// If masking is needed, generate the mask and mask the operation.
if (needMaskForWrite) {
+ // Get the mask shape + type. Missing mask dimensions are taken from
+ // `vectorToStore`.
SmallVector<int64_t> writeMaskShape;
writeMaskShape.append(inputVecSizesForLeadingDims.begin(),
inputVecSizesForLeadingDims.end());
- writeMaskShape.append(destShape.begin() +
- inputVecSizesForLeadingDims.size(),
- destShape.end());
+ if (vecToStoreRank >
+ static_cast<int64_t>(inputVecSizesForLeadingDims.size()))
+ writeMaskShape.append(vecToStoreShape.begin() +
+ inputVecSizesForLeadingDims.size(),
+ vecToStoreShape.end());
auto writeMaskType = VectorType::get(writeMaskShape, builder.getI1Type());
- Value maskForWrite = builder.create<vector::CreateMaskOp>(
- loc, writeMaskType, tensor::getMixedSizes(builder, loc, dest));
+
+ SmallVector<OpFoldResult> destSizes =
+ tensor::getMixedSizes(builder, loc, dest);
+ SmallVector<OpFoldResult> maskSizes(destSizes.end() - writeMaskShape.size(),
+ destSizes.end());
+
+ if (isMaskTriviallyFoldable(maskSizes, writeIndices, destShape,
+ writeMaskShape))
+ return write;
+
+ Value maskForWrite = builder.createOrFold<vector::CreateMaskOp>(
+ loc, writeMaskType, maskSizes);
write = mlir::vector::maskOperation(builder, write, maskForWrite);
}
@@ -1700,10 +1824,10 @@ vectorizeAsTensorPackOp(RewriterBase &rewriter, linalg::PackOp packOp,
Value dest = rewriter.create<tensor::EmptyOp>(
loc, reifiedReturnShapes[0],
transposeOp.getResult().getType().getElementType());
- Operation *write =
- createWriteOrMaskedWrite(rewriter, loc, transposeOp.getResult(), dest,
- /*inputVecSizesForLeadingDims=*/inputVectorSizes,
- /*useInBoundsInsteadOfMasking=*/false);
+ Operation *write = createWriteOrMaskedWrite(
+ rewriter, loc, transposeOp.getResult(), dest,
+ /*inputVecSizesForLeadingDims=*/inputVectorSizes, /*writeIndices=*/{},
+ /*useInBoundsInsteadOfMasking=*/false);
newResults.push_back(write->getResult(0));
return success();
}
@@ -1839,10 +1963,10 @@ vectorizeAsTensorUnpackOp(RewriterBase &rewriter, linalg::UnPackOp unpackOp,
Value dest = rewriter.create<tensor::EmptyOp>(
loc, reifiedRetShapes[0],
shapeCastOp.getResult().getType().getElementType());
- Operation *write =
- createWriteOrMaskedWrite(rewriter, loc, shapeCastOp.getResult(), dest,
- /*inputVecSizesForLeadingDims=*/writeVectorSizes,
- useInBoundsInsteadOfMasking);
+ Operation *write = createWriteOrMaskedWrite(
+ rewriter, loc, shapeCastOp.getResult(), dest,
+ /*inputVecSizesForLeadingDims=*/writeVectorSizes,
+ /*writeIndices=*/{}, useInBoundsInsteadOfMasking);
newResults.push_back(write->getResult(0));
return success();
}
@@ -1874,10 +1998,10 @@ vectorizeAsTensorPadOp(RewriterBase &rewriter, tensor::PadOp padOp,
// Create Xfer write Op
Value dest = rewriter.create<tensor::EmptyOp>(
loc, reifiedReturnShapes[0], padOp.getResultType().getElementType());
- Operation *write =
- createWriteOrMaskedWrite(rewriter, loc, maskedRead, dest,
- /*inputVecSizesForLeadingDims=*/inputVectorSizes,
- /*useInBoundsInsteadOfMasking=*/false);
+ Operation *write = createWriteOrMaskedWrite(
+ rewriter, loc, maskedRead, dest,
+ /*inputVecSizesForLeadingDims=*/inputVectorSizes, {},
+ /*useInBoundsInsteadOfMasking=*/false);
newResults.push_back(write->getResult(0));
return success();
}
@@ -2922,53 +3046,19 @@ vectorizeAsInsertSliceOp(RewriterBase &rewriter, tensor::InsertSliceOp sliceOp,
auto vecType = VectorType::get(vecShape, sourceType.getElementType());
// 3. Generate TransferReadOp + TransferWriteOp
- ReifiedRankedShapedTypeDims reifiedSrcSizes;
- Value maskOp;
-
- // If vector sizes are user provided, make sure to mask. First, generate the
- // mask.
- if (!inputVectorSizes.empty()) {
- auto *srcDefOp = source.getDefiningOp();
- if (!srcDefOp) {
- LDBG("Unable to get the defining Op of " << sliceOp);
- return failure();
- }
-
- LogicalResult status =
- cast<ReifyRankedShapedTypeOpInterface>(srcDefOp).reifyResultShapes(
- rewriter, reifiedSrcSizes);
- if (status.failed()) {
- LDBG("Unable to reify result shapes of " << srcDefOp);
- return failure();
- }
-
- // Create the mask
- auto readMaskType = VectorType::get(inputVectorSizes, rewriter.getI1Type());
- maskOp = rewriter.create<vector::CreateMaskOp>(
- sliceOp.getLoc(), readMaskType, reifiedSrcSizes[0]);
- }
+ auto loc = sliceOp.getLoc();
+ // Create read
SmallVector<Value> readIndices(
- vecType.getRank(),
- rewriter.create<arith::ConstantIndexOp>(sliceOp.getLoc(), 0));
- Operation *read = rewriter.create<vector::TransferReadOp>(
- sliceOp.getLoc(), vecType, source, readIndices, padValue,
- ArrayRef<bool>{readInBounds});
-
- if (maskOp) {
- read = mlir::vector::maskOperation(rewriter, read, maskOp);
- }
-
- auto writeIndices = getValueOrCreateConstantIndexOp(
- rewriter, sliceOp.getLoc(), sliceOp.getMixedOffsets());
-
- Operation *write = rewriter.create<vector::TransferWriteOp>(
- sliceOp.getLoc(), read->getResult(0), sliceOp.getDest(), writeIndices,
- ArrayRef<bool>{writeInBounds});
-
- if (maskOp) {
- write = mlir::vector::maskOperation(rewriter, write, maskOp);
- }
+ vecType.getRank(), rewriter.create<arith::ConstantIndexOp>(loc, 0));
+ Value read = mlir::vector::createReadOrMaskedRead(
+ rewriter, loc, source, vecType.getShape(), padValue);
+
+ // Create write
+ auto writeIndices =
+ getValueOrCreateConstantIndexOp(rewriter, loc, sliceOp.getMixedOffsets());
+ Operation *write = createWriteOrMaskedWrite(
+ rewriter, loc, read, sliceOp.getDest(), vecType.getShape(), writeIndices);
// 4. Finalize
newResults.push_back(write->getResult(0));
diff --git a/mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp b/mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp
index d5dd6f2027be8..dda4856596bba 100644
--- a/mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp
+++ b/mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp
@@ -337,13 +337,13 @@ Value vector::createReadOrMaskedRead(OpBuilder &builder, Location loc,
auto sourceShape = sourceShapedType.getShape();
assert(sourceShape.size() == inputVectorSizes.size() &&
"expected same ranks.");
- auto maskType = VectorType::get(inputVectorSizes, builder.getI1Type());
auto vectorType = VectorType::get(inputVectorSizes, padValue.getType());
assert(padValue.getType() == sourceShapedType.getElementType() &&
"expected same pad element type to match source element type");
int64_t readRank = inputVectorSizes.size();
auto zero = builder.create<arith::ConstantIndexOp>(loc, 0);
SmallVector<bool> inBoundsVal(readRank, true);
+
if (useInBoundsInsteadOfMasking) {
// Update the inBounds attribute.
for (unsigned i = 0; i < readRank; i++)
@@ -362,6 +362,8 @@ Value vector::createReadOrMaskedRead(OpBuilder &builder, Location loc,
return transferReadOp;
SmallVector<OpFoldResult> mixedSourceDims =
tensor::getMixedSizes(builder, loc, source);
+
+ auto maskType = VectorType::get(inputVectorSizes, builder.getI1Type());
Value mask =
builder.create<vector::CreateMaskOp>(loc, maskType, mixedSourceDims);
return mlir::vector::maskOperation(builder, transferReadOp, mask)
diff --git a/mlir/test/Dialect/LLVM/transform-e2e.mlir b/mlir/test/Dialect/LLVM/transform-e2e.mlir
index c00b47fb936e9..98cfaf249c898 100644
--- a/mlir/test/Dialect/LLVM/transform-e2e.mlir
+++ b/mlir/test/Dialect/LLVM/transform-e2e.mlir
@@ -18,16 +18,14 @@ module attributes {transform.with_named_sequence} {
%1, %loops:3 = transform.structured.tile_using_for %0 tile_sizes [2, 2, 2] : (!transform.any_op) -> (!transform.any_op, !transform.any_op, !transform.any_op, !transform.any_op)
%2 = transform.get_parent_op %1 {isolated_from_above} : (!transform.any_op) -> !transform.any_op
transform.structured.vectorize_children_and_apply_patterns %2 : (!transform.any_op) -> !transform.any_op
- %b = transform.bufferization.one_shot_bufferize layout{IdentityLayoutMap}
- %module_op {bufferize_function_boundaries = true}
- : (!transform.any_op) -> !transform.any_op
- %f = transform.structured.match ops{["func.func"]} in %b
+ %f = transform.structured.match ops{["func.func"]} in %module_op
: (!transform.any_op) -> !transform.any_op
// TODO: group these lower-level controls into various properly named vector
// lowering TD macros.
transform.apply_patterns to %f {
+ transform.apply_patterns.vector.lower_masked_transfers
transform.apply_patterns.vector.lower_contraction lowering_strategy = "outerproduct"
transform.apply_patterns.vector.transfer_permutation_patterns
transform.apply_patterns.vector.lower_multi_reduction lowering_strategy = "innerparallel"
@@ -37,6 +35,10 @@ module attributes {transform.with_named_sequence} {
transform.apply_patterns.vector.lower_shape_cast
transform.apply_patterns.vector.lower_transpose lowering_strategy = "shuffle_1d"
} : !transform.any_op
+
+ %b = transform.bufferization.one_shot_bufferize layout{IdentityLayoutMap}
+ %module_op {bufferize_function_boundaries = true}
+ : (!transform.any_op) -> !transform.any_op
transform.yield
}
}
diff --git a/mlir/test/Dialect/Linalg/vectorization/insert-slice-with-patterns.mlir b/mlir/test/Dialect/Linalg/vectorization/insert-slice-with-patterns.mlir
index f7764be9be73f..d1f2ed194f6ce 100644
--- a/mlir/test/Dialect/Linalg/vectorization/insert-slice-with-patterns.mlir
+++ b/mlir/test/Dialect/Linalg/vectorization/insert-slice-with-patterns.mlir
@@ -67,10 +67,19 @@ module attributes {transform.with_named_sequence} {
// CHECK-SAME: %[[ARG_0:.*]]: tensor<1x?x3xf32>,
// CHECK-SAME: %[[PAD:.*]]: f32,
// CHECK-SAME: %[[SIZE:.*]]: index) -> tensor<9x8x7x1x2x3xf32> {
+// CHECK: %[[C3:.*]] = arith.constant 3 : index
+// CHECK: %[[C1:.*]] = arith.constant 1 : index
+// CHECK: %[[C0:.*]] = arith.constant 0 : index
// CHECK: %[[EMPTY:.*]] = tensor.empty() : tensor<9x8x7x1x2x3xf32>
// CHECK: %[[BC:.*]] = vector.broadcast %[[PAD]] : f32 to vector<9x8x7x1x2x3xf32>
// CHECK: %[[WRITE:.*]] = vector.transfer_write %[[BC]], %[[EMPTY]]{{.*}} {in_bounds = [true, true, true, true, true, true]} : vector<9x8x7x1x2x3xf32>, tensor<9x8x7x1x2x3xf32>
-// CHECK: %[[READ:.*]] = vector.transfer_read %[[ARG_0]]{{.*}}, %[[PAD]] {in_bounds = [true, false, true]} : tensor<1x?x3xf32>, vector<1x2x3xf32>
+
+// CHECK: %[[D1:.*]] = tensor.dim %[[ARG_0]], %[[C1]] : tensor<1x?x3xf32>
+// CHECK: %[[MASK:.*]] = vector.create_mask %[[C1]], %[[D1]], %[[C3]] : vector<1x2x3xi1>
+// CHECK: %[[READ:.*]] = vector.mask %[[MASK]] {
+// CHECK-SAME: vector.transfer_read %[[ARG_0]][%[[C0]], %[[C0]], %[[C0]]], %[[PAD]] {in_bounds = [true, true, true]} : tensor<1x?x3xf32>, vector<1x2x3xf32>
+// CHECK-SAME: } : vector<1x2x3xi1> -> vector<1x2x3xf32>
+
// CHECK: %[[RES:.*]] = vector.transfer_write %[[READ]], %[[WRITE]]{{.*}} {in_bounds = [true, true, true]} : vector<1x2x3xf32>, tensor<9x8x7x1x2x3xf32>
// CHECK: return %[[RES]] : tensor<9x8x7x1x2x3xf32>
func.func @insert_dynamic_slice_non_zero_pad(%arg0: tensor<1x?x3xf32>, %pad : f32, %size: index) -> tensor<9x8x7x1x2x3xf32> {
diff --git a/mlir/test/Dialect/Linalg/vectorization/insert-slice.mlir b/mlir/test/Dialect/Linalg/vectorization/insert-slice.mlir
index ddd4f433b3657..0563c21f220eb 100644
--- a/mlir/test/Dialect/Linalg/vectorization/insert-slice.mlir
+++ b/mlir/test/Dialect/Linalg/vectorization/insert-slice.mlir
@@ -20,22 +20,26 @@ func.func private @insert_slice_static_sizes(%source: tensor<?x3x?x1xi32>) -> te
// CHECK: %[[INIT:.*]] = tensor.empty() : tensor<5x3xi32>
// CHECK: %[[SRC_SLICE:.*]] = tensor.extract_slice %[[SEC]][0, %[[C_2]], 0, 0] [1, 1, 5, 1] [1, 1, 1, 1] : tensor<?x3x?x1xi32> to tensor<5x1xi32>
// CHECK-DAG: %[[PAD:.*]] = arith.constant 0 : i32
+// CHECK-DAG: %[[C_0_1:.*]] = arith.constant 0 : index
+// CHECK-DAG: %[[C_0:.*]] = arith.constant 0 : index
// CHECK-DAG: %[[C_5:.*]] = arith.constant 5 : index
// CHECK-DAG: %[[C_1:.*]] = arith.constant 1 : index
-// CHECK: %[[MASK:.*]] = vector.create_mask %[[C_5]], %[[C_1]] : vector<8x1xi1>
-// CHECK: %[[C0:.*]] = arith.constant 0 : index
-// CHECK: %[[READ:.*]] = vector.mask %[[MASK]] { vector.transfer_read %[[SRC_SLICE]][%[[C0]], %[[C0]]], %[[PAD]] : tensor<5x1xi32>, vector<8x1xi32> } : vector<8x1xi1> -> vector<8x1xi32>
-// CHECK: %[[C_0:.*]] = arith.constant 0 : index
-// CHECK: %[[RES:.*]] = vector.mask %[[MASK]] { vector.transfer_write %[[READ]], %[[INIT]][%[[C_0]], %[[C_2]]] : vector<8x1xi32>, tensor<5x3xi32> } : vector<8x1xi1> -> tensor<5x3xi32>
+// CHECK: %[[MASK_READ:.*]] = vector.create_mask %[[C_5]], %[[C_1]] : vector<8x1xi1>
+// CHECK: %[[READ:.*]] = vector.mask %[[MASK_READ]] { vector.transfer_read %[[SRC_SLICE]][%[[C_0]], %[[C_0]]], %[[PAD]] {{.*}} : tensor<5x1xi32>, vector<8x1xi32> } : vector<8x1xi1> -> vector<8x1xi32>
+// CHECK: %[[C_0_1:.*]] = arith.constant 0 : index
+// CHECK: %[[C_5_1:.*]] = arith.constant 5 : index
+// CHECK: %[[C_3:.*]] = arith.constant 3 : index
+// CHECK: %[[MASK_WRITE:.*]] = vector.create_mask %[[C_5_1]], %[[C_3]] : vector<8x1xi1>
+// CHECK: %[[RES:.*]] = vector.mask %[[MASK_WRITE]] { vector.transfer_write %[[READ]], %[[INIT]][%[[C_0_1]], %[[C_2]]] {in_bounds = [true, true]} : vector<8x1xi32>, tensor<5x3xi32> } : vector<8x1xi1> -> tensor<5x3xi32>
// CHECK: return %[[RES]] : tensor<5x3xi32>
- module attributes {transform.with_named_sequence} {
- transform.named_sequence @__transform_main(%arg0: !transform.any_op {transform.readonly}) {
- %0 = transform.structured.match ops{["tensor.insert_slice"]} in %arg0 : (!transform.any_op) -> !transform.any_op
- transform.structured.vectorize %0 vector_sizes [8, 1] : !transform.any_op
- transform.yield
- }
+module attributes {transform.with_named_sequence} {
+ transform.named_sequence @__transform_main(%arg0: !transform.any_op {transform.readonly}) {
+ %0 = transform.structured.match ops{["tensor.insert_slice"]} in %arg0 : (!transform.any_op) -> !transform.any_op
+ transform.structured.vectorize %0 vector_sizes [8, 1] : !transform.any_op
+ transform.yield
}
+}
// -----
@@ -59,11 +63,17 @@ func.func private @insert_slice_dynamic_src_dim(%source: tensor<?x3x?x1xi32>, %s
// CHECK: %[[SRC_SLICE:.*]] = tensor.extract_slice %[[SRC]][0, %[[C_2]], 0, 0] [1, 1, %[[SIZE]], 1] [1, 1, 1, 1] : tensor<?x3x?x1xi32> to tensor<?x1xi32>
// CHECK-DAG: %[[PAD:.*]] = arith.constant 0 : i32
// CHECK-DAG: %[[C_1:.*]] = arith.constant 1 : index
-// CHECK: %[[MASK:.*]] = vector.create_mask %[[SIZE]], %[[C_1]] : vector<8x1xi1>
-// CHECK: %[[C_0:.*]] = arith.constant 0 : index
-// CHECK: %[[READ:.*]] = vector.mask %[[MASK]] { vector.transfer_read %[[SRC_SLICE]][%[[C_0]], %[[C_0]]], %[[PAD]] : tensor<?x1xi32>, vector<8x1xi32> } : vector<8x1xi1> -> vector<8x1xi32>
+// CHECK-DAG: %[[C_0:.*]] = arith.constant 0 : index
+// CHECK-DAG: %[[C_0_1:.*]] = arith.constant 0 : index
+// CHECK-DAG: %[[C_0_2:.*]] = arith.constant 0 : index
+// CHECK-DAG: %[[D0:.*]] = tensor.dim %[[SRC_SLICE]], %[[C_0_2]] : tensor<?x1xi32>
+// CHECK: %[[MASK:.*]] = vector.create_mask %[[D0]], %[[C_1]] : vector<8x1xi1>
+// CHECK: %[[READ:.*]] = vector.mask %[[MASK]] { vector.transfer_read %[[SRC_SLICE]][%[[C_0_1]], %[[C_0_1]]], %[[PAD]] {{.*}} : tensor<?x1xi32>, vector<8x1xi32> } : vector<8x1xi1> -> vector<8x1xi32>
// CHECK: %[[C_0_1:.*]] = arith.constant 0 : index
-// CHECK: %[[RES:.*]] = vector.mask %[[MASK]] { vector.transfer_write %[[READ]], %[[INIT]][%[[C_0_1]], %[[C_2]]] : vector<8x1xi32>, tensor<5x3xi32> } : vector<8x1xi1> -> tensor<5x3xi32>
+// CHECK: %[[C_5_1:.*]] = arith.constant 5 : index
+// CHECK: %[[C_3:.*]] = arith.constant 3 : index
+// CHECK: %[[MASK_WRITE:.*]] = vector.create_mask %[[C_5_1]], %[[C_3]] : vector<8x1xi1>
+// CHECK: %[[RES:.*]] = vector.mask %[[MASK_WRITE]] { vector.transfer_write %[[READ]], %[[INIT]][%[[C_0_1]], %[[C_2]]] {in_bounds = [true, true]} : vector<8x1xi32>, tensor<5x3xi32> } : vector<8x1xi1> -> tensor<5x3xi32>
// CHECK: return %[[RES]] : tensor<5x3xi32>
module attributes {transform.with_named_sequence} {
@@ -94,15 +104,20 @@ func.func private @insert_slice_dynamic_dest_dim(%source: tensor<?x3x?x1xi32>, %
// CHECK: %[[C_2:.*]] = arith.constant 2 : index
// CHECK: %[[INIT:.*]] = tensor.empty(%[[SIZE]]) : tensor<?x3xi32>
// CHECK: %[[SRC_SLICE:.*]] = tensor.extract_slice %[[SRC]][0, %[[C_2]], 0, 0] [1, 1, 5, 1] [1, 1, 1, 1] : tensor<?x3x?x1xi32> to tensor<5x1xi32>
-// CHECK: %[[PAD:.*]] = arith.constant 0 : i32
-// CHECK: %[[C_5:.*]] = arith.constant 5 : index
-// CHECK: %[[C_1:.*]] = arith.constant 1 : index
-// CHECK: %[[MASK:.*]] = vector.create_mask %[[C_5]], %[[C_1]] : vector<8x1xi1>
-// CHECK: %[[C_0:.*]] = arith.constant 0 : index
-// CHECK: %[[READ:.*]] = vector.mask %[[MASK]] { vector.transfer_read %[[SRC_SLICE]][%[[C_0]], %[[C_0]]], %[[PAD]] : tensor<5x1xi32>, vector<8x1xi32> } : vector<8x1xi1> -> vector<8x1xi32>
+// CHECK-DAG: %[[PAD:.*]] = arith.constant 0 : i32
+// CHECK-DAG: %[[C_5:.*]] = arith.constant 5 : index
+// CHECK-DAG: %[[C_1:.*]] = arith.constant 1 : index
+// CHECK-DAG: %[[MASK:.*]] = vector.create_mask %[[C_5]], %[[C_1]] : vector<8x1xi1>
+// CHECK-DAG: %[[C_0:.*]] = arith.constant 0 : index
+// CHECK-DAG: %[[C_0_1:.*]] = arith.constant 0 : index
+// CHECK: %[[READ:.*]] = vector.mask %[[MASK]] { vector.transfer_read %[[SRC_SLICE]][%[[C_0_1]], %[[C_0_1]]], %[[PAD]] {{.*}} : tensor<5x1xi32>, vector<8x1xi32> } : vector<8x1xi1> -> vector<8x1xi32>
// CHECK: %[[C_0_1:.*]] = arith.constant 0 : index
-// CHECK: %[[WRITE:.*]] = vector.mask %[[MASK]] { vector.transfer_write %[[READ]], %[[INIT]][%[[C_0_1]], %[[C_2]]] : vector<8x1xi32>, tensor<?x3xi32> } : vector<8x1xi1> -> tensor<?x3xi32>
-// CHECK: return %[[WRITE]] : tensor<?x3xi32>
+// CHECK: %[[C_0_2:.*]] = arith.constant 0 : index
+// CHECK: %[[DIM:.*]] = tensor.dim %[[INIT]], %[[C_0_2]] : tensor<?x3xi32>
+// CHECK: %[[C_3:.*]] = arith.constant 3 : index
+// CHECK: %[[MASK_WRITE:.*]] = vector.create_mask %[[DIM]], %[[C_3]] : vector<8x1xi1>
+// CHECK: %[[RES:.*]] = vector.mask %[[MASK_WRITE]] { vector.transfer_write %[[READ]], %[[INIT]][%[[C_0_1]], %[[C_2]]] {in_bounds = [true, true]} : vector<8x1xi32>, tensor<?x3xi32> } : vector<8x1xi1> -> tensor<?x3xi32>
+// CHECK: return %[[RES]] : tensor<?x3xi32>
module attributes {transform.with_named_sequence} {
transform.named_sequence @__transform_main(%arg0: !transform.any_op {transform.readonly}) {
@@ -131,15 +146,21 @@ func.func private @insert_slice_dynamic_source_and_dest_dim(%source: tensor<?x3x
// CHECK-SAME: %[[SIZE:.*]]: index) -> tensor<?x3xi32> {
// CHECK: %[[C_2:.*]] = arith.constant 2 : index
// CHECK: %[[INIT:.*]] = tensor.empty(%[[SIZE]]) : tensor<?x3xi32>
-// CHECK: %[[SRC_SIZE:.*]] = tensor.extract_slice %[[SRC]][0, %[[C_2]], 0, 0] [1, 1, %[[SIZE]], 1] [1, 1, 1, 1] : tensor<?x3x?x1xi32> to tensor<?x1xi32>
-// CHECK: %[[PAD:.*]] = arith.constant 0 : i32
+// CHECK: %[[SRC_SLICE:.*]] = tensor.extract_slice %[[SRC]][0, %[[C_2]], 0, 0] [1, 1, %[[SIZE]], 1] [1, 1, 1, 1] : tensor<?x3x?x1xi32> to tensor<?x1xi32>
+// CHECK-DAG: %[[PAD:.*]] = arith.constant 0 : i32
+// CHECK-DAG: %[[C0_0:.*]] = arith.constant 0 : index
+// CHECK-DAG: %[[C0_1:.*]] = arith.constant 0 : index
+// CHECK-DAG: %[[C0_2:.*]] = arith.constant 0 : index
+// CHECK: %[[D0:.*]] = tensor.dim %[[SRC_SLICE]], %[[C0_2]] : tensor<?x1xi32>
// CHECK: %[[C1:.*]] = arith.constant 1 : index
-// CHECK: %[[MASK:.*]] = vector.create_mask %[[SIZE]], %[[C1]] : vector<8x1xi1>
-// CHECK: %[[C0:.*]] = arith.constant 0 : index
-// CHECK: %[[READ:.*]] = vector.mask %[[MASK]] { vector.transfer_read %[[SRC_SIZE]]{{\[}}%[[C0]], %[[C0]]], %[[PAD]] : tensor<?x1xi32>, vector<8x1xi32> } : vector<8x1xi1> -> vector<8x1xi32>
+// CHECK: %[[MASK:.*]] = vector.create_mask %[[D0]], %[[C1]] : vector<8x1xi1>
+// CHECK: %[[READ:.*]] = vector.mask %[[MASK]] { vector.transfer_read %[[SRC_SLICE]][%[[C0_1]], %[[C0_1]]], %[[PAD]] {{.*}} : tensor<?x1xi32>, vector<8x1xi32> } : vector<8x1xi1> -> vector<8x1xi32>
// CHECK: %[[C_0_1:.*]] = arith.constant 0 : index
-// CHECK: %[[WRITE:.*]] = vector.mask %[[MASK]] { vector.transfer_write %[[READ]], %[[INIT]]{{\[}}%[[C_0_1]], %[[C_2]]] : vector<8x1xi32>, tensor<?x3xi32> } : vector<8x1xi1> -> tensor<?x3xi32>
-// CHECK: return %[[WRITE]] : tensor<?x3xi32>
+// CHECK: %[[C_0_2:.*]] = arith.constant 0 : index
+// CHECK: %[[DIM:.*]] = tensor.dim %[[INIT]], %[[C_0_2]] : tensor<?x3xi32>
+// CHECK: %[[C_3:.*]] = arith.constant 3 : index
+// CHECK: %[[MASK_WRITE:.*]] = vector.create_mask %[[DIM]], %[[C_3]] : vector<8x1xi1>
+// CHECK: %[[RES:.*]] = vector.mask %[[MASK_WRITE]] { vector.transfer_write %[[READ]], %[[INIT]][%[[C_0_1]], %[[C_2]]] {in_bounds = [true, true]} : vector<8x1xi32>, tensor<?x3xi32> } : vector<8x1xi1> -> tensor<?x3xi32>
module attributes {transform.with_named_sequence} {
transform.named_sequence @__transform_main(%arg0: !transform.any_op {transform.readonly}) {
diff --git a/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir b/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir
index fa9daad1dcbf8..6722de817f6bf 100644
--- a/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir
+++ b/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir
@@ -1394,4 +1394,3 @@ func.func @test_vectorize_unpack_no_vector_sizes_permute(%source: tensor<4x7x4xf
transform.yield
}
}
-
diff --git a/mlir/test/Dialect/Linalg/vectorization/pad-with-patterns.mlir b/mlir/test/Dialect/Linalg/vectorization/pad-with-patterns.mlir
index 4086d5458313e..1baead0c09a52 100644
--- a/mlir/test/Dialect/Linalg/vectorization/pad-with-patterns.mlir
+++ b/mlir/test/Dialect/Linalg/vectorization/pad-with-patterns.mlir
@@ -5,16 +5,23 @@
///----------------------------------------------------------------------------------------
// CHECK-LABEL: func @pad_static(
-// CHECK-SAME: %[[ARG0:.*]]: tensor<2x?x2xf32>, %[[PAD:.*]]: f32
-// CHECK-NOT: tensor.pad
-// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
-// CHECK-DAG: %[[C2:.*]] = arith.constant 2 : index
-// CHECK-DAG: %[[INIT:.*]] = tensor.empty() : tensor<2x3x4xf32>
-// CHECK-DAG: %[[VEC:.*]] = vector.broadcast %[[PAD]] : f32 to vector<2x3x4xf32>
-// CHECK: %[[FILL:.*]] = vector.transfer_write %[[VEC]], %[[INIT]]{{.*}} : vector<2x3x4xf32>, tensor<2x3x4xf32>
-// CHECK: %[[READ:.*]] = vector.transfer_read %[[ARG0]][%[[C0]], %[[C0]], %[[C0]]], %[[PAD]] {in_bounds = [true, false, true]} : tensor<2x?x2xf32>, vector<2x3x2xf32>
-// CHECK: %[[RESULT:.*]] = vector.transfer_write %[[READ]], %[[FILL]][%[[C0]], %[[C0]], %[[C2]]] {in_bounds = [true, true, true]} : vector<2x3x2xf32>, tensor<2x3x4xf32>
-// CHECK: return %[[RESULT]]
+// CHECK-SAME: %[[ARG0:.*]]: tensor<2x?x2xf32>,
+// CHECK-SAME: %[[ARG1:.*]]: f32) -> tensor<2x3x4xf32> {
+// CHECK-DAG: %[[C2:.*]] = arith.constant 2 : index
+// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
+// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
+// CHECK: %[[EMPTY:.*]] = tensor.empty() : tensor<2x3x4xf32>
+// CHECK: %[[INIT:.*]] = vector.broadcast %[[ARG1]] : f32 to vector<2x3x4xf32>
+// CHECK: %[[OUT_TENSOR:.*]] = vector.transfer_write %[[INIT]], %[[EMPTY]]{{\[}}%[[C0]], %[[C0]], %[[C0]]] {in_bounds = [true, true, true]} : vector<2x3x4xf32>, tensor<2x3x4xf32>
+// CHECK: %[[DIM_1:.*]] = tensor.dim %[[ARG0]], %[[C1]] : tensor<2x?x2xf32>
+// CHECK: %[[MASK_READ:.*]] = vector.create_mask %[[C2]], %[[DIM_1]], %[[C2]] : vector<2x3x2xi1>
+// CHECK: %[[READ:.*]] = vector.mask %[[MASK_READ]] {
+// CHECK-SAME: vector.transfer_read %[[ARG0]]{{\[}}%[[C0]], %[[C0]], %[[C0]]], %[[ARG1]]
+// CHECK-SAME: {in_bounds = [true, true, true]} : tensor<2x?x2xf32>, vector<2x3x2xf32>
+// CHECK-SAME: } : vector<2x3x2xi1> -> vector<2x3x2xf32>
+// CHECK: %[[RESULT:.*]] = vector.transfer_write %[[READ]], %[[OUT_TENSOR]]{{\[}}%[[C0]], %[[C0]], %[[C2]]]
+// CHECK-SAME: {in_bounds = [true, true, true]} : vector<2x3x2xf32>, tensor<2x3x4xf32>
+// CHECK: return %[[RESULT]] : tensor<2x3x4xf32>
func.func @pad_static(%arg0: tensor<2x?x2xf32>, %pad_value: f32) -> tensor<2x3x4xf32> {
%0 = tensor.pad %arg0 low[0, 0, 2] high[0, 1, 0] {
^bb0(%arg1: index, %arg2: index, %arg3: index):
diff --git a/mlir/test/Dialect/Vector/transform-vector.mlir b/mlir/test/Dialect/Vector/transform-vector.mlir
index 4b38db79bff3e..ddf212c5ef412 100644
--- a/mlir/test/Dialect/Vector/transform-vector.mlir
+++ b/mlir/test/Dialect/Vector/transform-vector.mlir
@@ -6,7 +6,11 @@ func.func @matmul_tensors(
-> tensor<8x32xf32> {
// CHECK-NOT: linalg
// CHECK: vector.extract {{.*}} : vector<4xf32> from vector<8x4xf32>
-// CHECK: vector.store {{.*}} : memref<8x32xf32>, vector<4xf32>
+// TODO: `vector.maskedstore` below could safely be replaced with
+// `vector.store`. It's present due to the vectorization logic for
+// `tensor.insert_slice` conservatively applying masks. However, it this case,
+// we should be able to remove it via value-bounds checks.
+// CHECK: vector.maskedstore {{.*}} : memref<8x32xf32>, vector<4xi1>, vector<4xf32>
%0 = linalg.matmul ins(%arg0, %arg1: tensor<8x16xf32>, tensor<16x32xf32>)
outs(%arg2: tensor<8x32xf32>)
-> tensor<8x32xf32>
@@ -20,16 +24,16 @@ module attributes {transform.with_named_sequence} {
: (!transform.any_op) -> (!transform.any_op, !transform.any_op, !transform.any_op, !transform.any_op)
%2 = transform.get_parent_op %1 {isolated_from_above} : (!transform.any_op) -> !transform.any_op
transform.structured.vectorize_children_and_apply_patterns %2 : (!transform.any_op) -> !transform.any_op
- %b = transform.bufferization.one_shot_bufferize
- layout{IdentityLayoutMap} %module_op
- {bufferize_function_boundaries = true, allow_return_allocs = true}
- : (!transform.any_op) -> !transform.any_op
- %f = transform.structured.match ops{["func.func"]} in %b
+ %f = transform.structured.match ops{["func.func"]} in %module_op
: (!transform.any_op) -> !transform.any_op
// TODO: group these lower-level controls into various properly named vector
// lowering TD macros.
+ transform.apply_patterns to %f {
+ transform.apply_patterns.vector.lower_masked_transfers
+ } : !transform.any_op
+
transform.apply_patterns to %f {
transform.apply_patterns.vector.lower_contraction lowering_strategy = "outerproduct"
} : !transform.any_op
@@ -46,21 +50,37 @@ module attributes {transform.with_named_sequence} {
transform.apply_patterns.vector.split_transfer_full_partial split_transfer_strategy = "linalg-copy"
} : !transform.any_op
- transform.apply_patterns to %f {
+ // By default, UnrollTransferWriteConversion (applied below via
+ // `transfer_to_scf`) will only work on MemRef(s). While there's an option
+ // to relax that, it's currently not wired-up with the TD logic. Bufferize
+ // here as otherwise unrolling will not work.
+ // TODO: Extend `transform.apply_patterns.vector.transfer_to_scf` to allow
+ // unrolling xfer Ops on tensors and move bufferization all the way down.
+ %b = transform.bufferization.one_shot_bufferize
+ layout{IdentityLayoutMap} %module_op
+ {bufferize_function_boundaries = true, allow_return_allocs = true}
+ : (!transform.any_op) -> !transform.any_op
+
+ %fb = transform.structured.match ops{["func.func"]} in %b
+ : (!transform.any_op) -> !transform.any_op
+
+ transform.apply_patterns to %fb {
transform.apply_patterns.vector.transfer_to_scf max_transfer_rank = 1 full_unroll = true
} : !transform.any_op
- transform.apply_patterns to %f {
+ transform.apply_patterns to %fb {
transform.apply_patterns.vector.lower_transfer max_transfer_rank = 1
} : !transform.any_op
- transform.apply_patterns to %f {
+ transform.apply_patterns to %fb {
transform.apply_patterns.vector.lower_shape_cast
} : !transform.any_op
- transform.apply_patterns to %f {
+ transform.apply_patterns to %fb {
transform.apply_patterns.vector.lower_transpose lowering_strategy = "shuffle_1d"
} : !transform.any_op
+
+
transform.yield
}
}
>From 175d32b45ab2c230d12c03f3ad9c6fbdf7be2555 Mon Sep 17 00:00:00 2001
From: Andrzej Warzynski <andrzej.warzynski at arm.com>
Date: Thu, 29 May 2025 19:47:22 +0100
Subject: [PATCH 2/3] fixup! [mlir][linalg] Refactor vectorization hooks to
improve code reuse
* Restore the original behaviour in `vectorizeAsInsertSliceOp`, whereby
the `in_bounds` attribute was used to identify potentially
out-of-bounds accesses. Masks are only used when input vector sizes
are specified.
* Revert the changes in insert-slice-with-patterns.mlir and
pad-with-patterns.mlir, i.e. the tests in which we don't specify
vector sizes.
* Other minor updates.
---
.../Linalg/Transforms/Vectorization.cpp | 38 ++++++-------------
.../insert-slice-with-patterns.mlir | 11 +-----
.../vectorization/pad-with-patterns.mlir | 27 +++++--------
3 files changed, 23 insertions(+), 53 deletions(-)
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
index 71ffebc7c518c..ac3437f3da412 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
@@ -1659,9 +1659,10 @@ createWriteOrMaskedWrite(OpBuilder &builder, Location loc, Value vectorToStore,
static_cast<size_t>(vecToStoreType.getRank()) &&
"Insufficient number of input vector sizes!");
// Update the inBounds attribute.
- for (unsigned i = 0; i < destRank; i++)
- inBoundsVal[i] = (destShape[i] == inputVecSizesForLeadingDims[i]) &&
- !ShapedType::isDynamic(destShape[i]);
+ for (unsigned i = 0; i < vecToStoreRank; i++)
+ inBoundsVal[i] =
+ (destShape[i] == inputVecSizesForLeadingDims[i]) &&
+ !ShapedType::isDynamic(destShape[destRank - vecToStoreRank + i]);
}
// If missing, initialize the write indices to 0.
@@ -1670,7 +1671,7 @@ createWriteOrMaskedWrite(OpBuilder &builder, Location loc, Value vectorToStore,
"Invalid number of write indices!");
if (writeIndices.empty()) {
auto zero = builder.create<arith::ConstantIndexOp>(loc, 0);
- writeIndices = SmallVector<Value>(destRank, zero);
+ writeIndices.assign(destRank, zero);
}
// Generate the xfer_write Op
@@ -1826,8 +1827,7 @@ vectorizeAsTensorPackOp(RewriterBase &rewriter, linalg::PackOp packOp,
transposeOp.getResult().getType().getElementType());
Operation *write = createWriteOrMaskedWrite(
rewriter, loc, transposeOp.getResult(), dest,
- /*inputVecSizesForLeadingDims=*/inputVectorSizes, /*writeIndices=*/{},
- /*useInBoundsInsteadOfMasking=*/false);
+ /*inputVecSizesForLeadingDims=*/inputVectorSizes);
newResults.push_back(write->getResult(0));
return success();
}
@@ -2000,8 +2000,7 @@ vectorizeAsTensorPadOp(RewriterBase &rewriter, tensor::PadOp padOp,
loc, reifiedReturnShapes[0], padOp.getResultType().getElementType());
Operation *write = createWriteOrMaskedWrite(
rewriter, loc, maskedRead, dest,
- /*inputVecSizesForLeadingDims=*/inputVectorSizes, {},
- /*useInBoundsInsteadOfMasking=*/false);
+ /*inputVecSizesForLeadingDims=*/inputVectorSizes);
newResults.push_back(write->getResult(0));
return success();
}
@@ -3007,22 +3006,14 @@ vectorizeAsInsertSliceOp(RewriterBase &rewriter, tensor::InsertSliceOp sliceOp,
sliceOp.getLoc(), elemType, rewriter.getZeroAttr(elemType));
}
- // 2. Get the vector shape and in-bounds attributes
+ // 2. Get the vector shape
SmallVector<int64_t> vecShape;
- SmallVector<bool> readInBounds;
- SmallVector<bool> writeInBounds;
size_t rankDiff = resultType.getRank() - sourceType.getRank();
for (int64_t i = 0, end = sourceType.getRank(); i < end; ++i) {
if (!inputVectorSizes.empty()) {
vecShape.push_back(inputVectorSizes[i]);
- readInBounds.push_back(false);
- writeInBounds.push_back(false);
} else if (!sourceType.isDynamicDim(i)) {
vecShape.push_back(sourceType.getDimSize(i));
- // Source shape is statically known: Neither read nor write are
- // out-of-bounds.
- readInBounds.push_back(true);
- writeInBounds.push_back(true);
} else if (!resultType.isDynamicDim(i)) {
// Source shape is not statically known, but result shape is.
// Vectorize with size of result shape. This may be larger than the
@@ -3030,16 +3021,9 @@ vectorizeAsInsertSliceOp(RewriterBase &rewriter, tensor::InsertSliceOp sliceOp,
// FIXME: Using rankDiff implies that the source tensor is inserted at
// the end of the destination tensor. However, that's not required.
vecShape.push_back(resultType.getDimSize(rankDiff + i));
- // Read may be out-of-bounds because the result size could be larger
- // than the source size.
- readInBounds.push_back(false);
- // Write will be in-bounds provided that the corresponding write idx is 0.
- // To keep this logic simple, conservatively mark as out-of-bounds.
- writeInBounds.push_back(false);
} else {
// Neither source nor result dim of padOp is static. Cannot vectorize
// the copy.
- // TODO: Add support for masking
return failure();
}
}
@@ -3052,13 +3036,15 @@ vectorizeAsInsertSliceOp(RewriterBase &rewriter, tensor::InsertSliceOp sliceOp,
SmallVector<Value> readIndices(
vecType.getRank(), rewriter.create<arith::ConstantIndexOp>(loc, 0));
Value read = mlir::vector::createReadOrMaskedRead(
- rewriter, loc, source, vecType.getShape(), padValue);
+ rewriter, loc, source, vecType.getShape(), padValue,
+ /*useInBoundsInsteadOfMasking=*/inputVectorSizes.empty());
// Create write
auto writeIndices =
getValueOrCreateConstantIndexOp(rewriter, loc, sliceOp.getMixedOffsets());
Operation *write = createWriteOrMaskedWrite(
- rewriter, loc, read, sliceOp.getDest(), vecType.getShape(), writeIndices);
+ rewriter, loc, read, sliceOp.getDest(), vecType.getShape(), writeIndices,
+ /*useInBoundsInsteadOfMasking=*/inputVectorSizes.empty());
// 4. Finalize
newResults.push_back(write->getResult(0));
diff --git a/mlir/test/Dialect/Linalg/vectorization/insert-slice-with-patterns.mlir b/mlir/test/Dialect/Linalg/vectorization/insert-slice-with-patterns.mlir
index d1f2ed194f6ce..f7764be9be73f 100644
--- a/mlir/test/Dialect/Linalg/vectorization/insert-slice-with-patterns.mlir
+++ b/mlir/test/Dialect/Linalg/vectorization/insert-slice-with-patterns.mlir
@@ -67,19 +67,10 @@ module attributes {transform.with_named_sequence} {
// CHECK-SAME: %[[ARG_0:.*]]: tensor<1x?x3xf32>,
// CHECK-SAME: %[[PAD:.*]]: f32,
// CHECK-SAME: %[[SIZE:.*]]: index) -> tensor<9x8x7x1x2x3xf32> {
-// CHECK: %[[C3:.*]] = arith.constant 3 : index
-// CHECK: %[[C1:.*]] = arith.constant 1 : index
-// CHECK: %[[C0:.*]] = arith.constant 0 : index
// CHECK: %[[EMPTY:.*]] = tensor.empty() : tensor<9x8x7x1x2x3xf32>
// CHECK: %[[BC:.*]] = vector.broadcast %[[PAD]] : f32 to vector<9x8x7x1x2x3xf32>
// CHECK: %[[WRITE:.*]] = vector.transfer_write %[[BC]], %[[EMPTY]]{{.*}} {in_bounds = [true, true, true, true, true, true]} : vector<9x8x7x1x2x3xf32>, tensor<9x8x7x1x2x3xf32>
-
-// CHECK: %[[D1:.*]] = tensor.dim %[[ARG_0]], %[[C1]] : tensor<1x?x3xf32>
-// CHECK: %[[MASK:.*]] = vector.create_mask %[[C1]], %[[D1]], %[[C3]] : vector<1x2x3xi1>
-// CHECK: %[[READ:.*]] = vector.mask %[[MASK]] {
-// CHECK-SAME: vector.transfer_read %[[ARG_0]][%[[C0]], %[[C0]], %[[C0]]], %[[PAD]] {in_bounds = [true, true, true]} : tensor<1x?x3xf32>, vector<1x2x3xf32>
-// CHECK-SAME: } : vector<1x2x3xi1> -> vector<1x2x3xf32>
-
+// CHECK: %[[READ:.*]] = vector.transfer_read %[[ARG_0]]{{.*}}, %[[PAD]] {in_bounds = [true, false, true]} : tensor<1x?x3xf32>, vector<1x2x3xf32>
// CHECK: %[[RES:.*]] = vector.transfer_write %[[READ]], %[[WRITE]]{{.*}} {in_bounds = [true, true, true]} : vector<1x2x3xf32>, tensor<9x8x7x1x2x3xf32>
// CHECK: return %[[RES]] : tensor<9x8x7x1x2x3xf32>
func.func @insert_dynamic_slice_non_zero_pad(%arg0: tensor<1x?x3xf32>, %pad : f32, %size: index) -> tensor<9x8x7x1x2x3xf32> {
diff --git a/mlir/test/Dialect/Linalg/vectorization/pad-with-patterns.mlir b/mlir/test/Dialect/Linalg/vectorization/pad-with-patterns.mlir
index 1baead0c09a52..4086d5458313e 100644
--- a/mlir/test/Dialect/Linalg/vectorization/pad-with-patterns.mlir
+++ b/mlir/test/Dialect/Linalg/vectorization/pad-with-patterns.mlir
@@ -5,23 +5,16 @@
///----------------------------------------------------------------------------------------
// CHECK-LABEL: func @pad_static(
-// CHECK-SAME: %[[ARG0:.*]]: tensor<2x?x2xf32>,
-// CHECK-SAME: %[[ARG1:.*]]: f32) -> tensor<2x3x4xf32> {
-// CHECK-DAG: %[[C2:.*]] = arith.constant 2 : index
-// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
-// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
-// CHECK: %[[EMPTY:.*]] = tensor.empty() : tensor<2x3x4xf32>
-// CHECK: %[[INIT:.*]] = vector.broadcast %[[ARG1]] : f32 to vector<2x3x4xf32>
-// CHECK: %[[OUT_TENSOR:.*]] = vector.transfer_write %[[INIT]], %[[EMPTY]]{{\[}}%[[C0]], %[[C0]], %[[C0]]] {in_bounds = [true, true, true]} : vector<2x3x4xf32>, tensor<2x3x4xf32>
-// CHECK: %[[DIM_1:.*]] = tensor.dim %[[ARG0]], %[[C1]] : tensor<2x?x2xf32>
-// CHECK: %[[MASK_READ:.*]] = vector.create_mask %[[C2]], %[[DIM_1]], %[[C2]] : vector<2x3x2xi1>
-// CHECK: %[[READ:.*]] = vector.mask %[[MASK_READ]] {
-// CHECK-SAME: vector.transfer_read %[[ARG0]]{{\[}}%[[C0]], %[[C0]], %[[C0]]], %[[ARG1]]
-// CHECK-SAME: {in_bounds = [true, true, true]} : tensor<2x?x2xf32>, vector<2x3x2xf32>
-// CHECK-SAME: } : vector<2x3x2xi1> -> vector<2x3x2xf32>
-// CHECK: %[[RESULT:.*]] = vector.transfer_write %[[READ]], %[[OUT_TENSOR]]{{\[}}%[[C0]], %[[C0]], %[[C2]]]
-// CHECK-SAME: {in_bounds = [true, true, true]} : vector<2x3x2xf32>, tensor<2x3x4xf32>
-// CHECK: return %[[RESULT]] : tensor<2x3x4xf32>
+// CHECK-SAME: %[[ARG0:.*]]: tensor<2x?x2xf32>, %[[PAD:.*]]: f32
+// CHECK-NOT: tensor.pad
+// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
+// CHECK-DAG: %[[C2:.*]] = arith.constant 2 : index
+// CHECK-DAG: %[[INIT:.*]] = tensor.empty() : tensor<2x3x4xf32>
+// CHECK-DAG: %[[VEC:.*]] = vector.broadcast %[[PAD]] : f32 to vector<2x3x4xf32>
+// CHECK: %[[FILL:.*]] = vector.transfer_write %[[VEC]], %[[INIT]]{{.*}} : vector<2x3x4xf32>, tensor<2x3x4xf32>
+// CHECK: %[[READ:.*]] = vector.transfer_read %[[ARG0]][%[[C0]], %[[C0]], %[[C0]]], %[[PAD]] {in_bounds = [true, false, true]} : tensor<2x?x2xf32>, vector<2x3x2xf32>
+// CHECK: %[[RESULT:.*]] = vector.transfer_write %[[READ]], %[[FILL]][%[[C0]], %[[C0]], %[[C2]]] {in_bounds = [true, true, true]} : vector<2x3x2xf32>, tensor<2x3x4xf32>
+// CHECK: return %[[RESULT]]
func.func @pad_static(%arg0: tensor<2x?x2xf32>, %pad_value: f32) -> tensor<2x3x4xf32> {
%0 = tensor.pad %arg0 low[0, 0, 2] high[0, 1, 0] {
^bb0(%arg1: index, %arg2: index, %arg3: index):
>From 373036ecb948cef8087f7ffae2ba2970d3f0ea70 Mon Sep 17 00:00:00 2001
From: Andrzej Warzynski <andrzej.warzynski at arm.com>
Date: Thu, 29 May 2025 21:00:13 +0100
Subject: [PATCH 3/3] fixup! fixup! [mlir][linalg] Refactor vectorization hooks
to improve code reuse
* Restore the changes in transform-e2e.mlir + transform-vector.mlir
* Updated in_bounds attribute calculation in `createWriteOrMaskedWrite`
- otherwise transform-e2e.mlir goes into an infite loop. I will create
a repro and open a GitHub issue before landing this.
* The in_bounds attribute calculaiton is incorrect and I will create a
GitHub ticket to fix it before merging this. See the comments in this
patch.
---
.../Linalg/Transforms/Vectorization.cpp | 3 +-
mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp | 1 +
mlir/test/Dialect/LLVM/transform-e2e.mlir | 10 ++---
.../test/Dialect/Vector/transform-vector.mlir | 40 +++++--------------
4 files changed, 17 insertions(+), 37 deletions(-)
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
index ac3437f3da412..afae84ea4045f 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
@@ -1659,9 +1659,10 @@ createWriteOrMaskedWrite(OpBuilder &builder, Location loc, Value vectorToStore,
static_cast<size_t>(vecToStoreType.getRank()) &&
"Insufficient number of input vector sizes!");
// Update the inBounds attribute.
+ // FIXME: This computation is too weak - it ignores the write indices.
for (unsigned i = 0; i < vecToStoreRank; i++)
inBoundsVal[i] =
- (destShape[i] == inputVecSizesForLeadingDims[i]) &&
+ (destShape[i] >= inputVecSizesForLeadingDims[i]) &&
!ShapedType::isDynamic(destShape[destRank - vecToStoreRank + i]);
}
diff --git a/mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp b/mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp
index dda4856596bba..590d244daef40 100644
--- a/mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp
+++ b/mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp
@@ -346,6 +346,7 @@ Value vector::createReadOrMaskedRead(OpBuilder &builder, Location loc,
if (useInBoundsInsteadOfMasking) {
// Update the inBounds attribute.
+ // FIXME: This computation is too weak - it ignores the read indices.
for (unsigned i = 0; i < readRank; i++)
inBoundsVal[i] = (sourceShape[i] == inputVectorSizes[i]) &&
!ShapedType::isDynamic(sourceShape[i]);
diff --git a/mlir/test/Dialect/LLVM/transform-e2e.mlir b/mlir/test/Dialect/LLVM/transform-e2e.mlir
index 98cfaf249c898..c00b47fb936e9 100644
--- a/mlir/test/Dialect/LLVM/transform-e2e.mlir
+++ b/mlir/test/Dialect/LLVM/transform-e2e.mlir
@@ -18,14 +18,16 @@ module attributes {transform.with_named_sequence} {
%1, %loops:3 = transform.structured.tile_using_for %0 tile_sizes [2, 2, 2] : (!transform.any_op) -> (!transform.any_op, !transform.any_op, !transform.any_op, !transform.any_op)
%2 = transform.get_parent_op %1 {isolated_from_above} : (!transform.any_op) -> !transform.any_op
transform.structured.vectorize_children_and_apply_patterns %2 : (!transform.any_op) -> !transform.any_op
+ %b = transform.bufferization.one_shot_bufferize layout{IdentityLayoutMap}
+ %module_op {bufferize_function_boundaries = true}
+ : (!transform.any_op) -> !transform.any_op
- %f = transform.structured.match ops{["func.func"]} in %module_op
+ %f = transform.structured.match ops{["func.func"]} in %b
: (!transform.any_op) -> !transform.any_op
// TODO: group these lower-level controls into various properly named vector
// lowering TD macros.
transform.apply_patterns to %f {
- transform.apply_patterns.vector.lower_masked_transfers
transform.apply_patterns.vector.lower_contraction lowering_strategy = "outerproduct"
transform.apply_patterns.vector.transfer_permutation_patterns
transform.apply_patterns.vector.lower_multi_reduction lowering_strategy = "innerparallel"
@@ -35,10 +37,6 @@ module attributes {transform.with_named_sequence} {
transform.apply_patterns.vector.lower_shape_cast
transform.apply_patterns.vector.lower_transpose lowering_strategy = "shuffle_1d"
} : !transform.any_op
-
- %b = transform.bufferization.one_shot_bufferize layout{IdentityLayoutMap}
- %module_op {bufferize_function_boundaries = true}
- : (!transform.any_op) -> !transform.any_op
transform.yield
}
}
diff --git a/mlir/test/Dialect/Vector/transform-vector.mlir b/mlir/test/Dialect/Vector/transform-vector.mlir
index ddf212c5ef412..4b38db79bff3e 100644
--- a/mlir/test/Dialect/Vector/transform-vector.mlir
+++ b/mlir/test/Dialect/Vector/transform-vector.mlir
@@ -6,11 +6,7 @@ func.func @matmul_tensors(
-> tensor<8x32xf32> {
// CHECK-NOT: linalg
// CHECK: vector.extract {{.*}} : vector<4xf32> from vector<8x4xf32>
-// TODO: `vector.maskedstore` below could safely be replaced with
-// `vector.store`. It's present due to the vectorization logic for
-// `tensor.insert_slice` conservatively applying masks. However, it this case,
-// we should be able to remove it via value-bounds checks.
-// CHECK: vector.maskedstore {{.*}} : memref<8x32xf32>, vector<4xi1>, vector<4xf32>
+// CHECK: vector.store {{.*}} : memref<8x32xf32>, vector<4xf32>
%0 = linalg.matmul ins(%arg0, %arg1: tensor<8x16xf32>, tensor<16x32xf32>)
outs(%arg2: tensor<8x32xf32>)
-> tensor<8x32xf32>
@@ -24,16 +20,16 @@ module attributes {transform.with_named_sequence} {
: (!transform.any_op) -> (!transform.any_op, !transform.any_op, !transform.any_op, !transform.any_op)
%2 = transform.get_parent_op %1 {isolated_from_above} : (!transform.any_op) -> !transform.any_op
transform.structured.vectorize_children_and_apply_patterns %2 : (!transform.any_op) -> !transform.any_op
+ %b = transform.bufferization.one_shot_bufferize
+ layout{IdentityLayoutMap} %module_op
+ {bufferize_function_boundaries = true, allow_return_allocs = true}
+ : (!transform.any_op) -> !transform.any_op
- %f = transform.structured.match ops{["func.func"]} in %module_op
+ %f = transform.structured.match ops{["func.func"]} in %b
: (!transform.any_op) -> !transform.any_op
// TODO: group these lower-level controls into various properly named vector
// lowering TD macros.
- transform.apply_patterns to %f {
- transform.apply_patterns.vector.lower_masked_transfers
- } : !transform.any_op
-
transform.apply_patterns to %f {
transform.apply_patterns.vector.lower_contraction lowering_strategy = "outerproduct"
} : !transform.any_op
@@ -50,37 +46,21 @@ module attributes {transform.with_named_sequence} {
transform.apply_patterns.vector.split_transfer_full_partial split_transfer_strategy = "linalg-copy"
} : !transform.any_op
- // By default, UnrollTransferWriteConversion (applied below via
- // `transfer_to_scf`) will only work on MemRef(s). While there's an option
- // to relax that, it's currently not wired-up with the TD logic. Bufferize
- // here as otherwise unrolling will not work.
- // TODO: Extend `transform.apply_patterns.vector.transfer_to_scf` to allow
- // unrolling xfer Ops on tensors and move bufferization all the way down.
- %b = transform.bufferization.one_shot_bufferize
- layout{IdentityLayoutMap} %module_op
- {bufferize_function_boundaries = true, allow_return_allocs = true}
- : (!transform.any_op) -> !transform.any_op
-
- %fb = transform.structured.match ops{["func.func"]} in %b
- : (!transform.any_op) -> !transform.any_op
-
- transform.apply_patterns to %fb {
+ transform.apply_patterns to %f {
transform.apply_patterns.vector.transfer_to_scf max_transfer_rank = 1 full_unroll = true
} : !transform.any_op
- transform.apply_patterns to %fb {
+ transform.apply_patterns to %f {
transform.apply_patterns.vector.lower_transfer max_transfer_rank = 1
} : !transform.any_op
- transform.apply_patterns to %fb {
+ transform.apply_patterns to %f {
transform.apply_patterns.vector.lower_shape_cast
} : !transform.any_op
- transform.apply_patterns to %fb {
+ transform.apply_patterns to %f {
transform.apply_patterns.vector.lower_transpose lowering_strategy = "shuffle_1d"
} : !transform.any_op
-
-
transform.yield
}
}
More information about the Mlir-commits
mailing list