[Mlir-commits] [mlir] [mlir][linalg][vector] Refine create{Read|Write}OrMasked{Read|Write} (nfc) (PR #135350)

Fri Apr 11 08:28:12 PDT 2025

https://github.com/banach-space updated https://github.com/llvm/llvm-project/pull/135350

>From 830f1d7358d07d828a4d660cd01bae8ba70503bc Mon Sep 17 00:00:00 2001
From: Andrzej Warzynski <andrzej.warzynski at arm.com>
Date: Fri, 11 Apr 2025 10:39:14 +0100
Subject: [PATCH 1/2] [mlir][linalg][vector] Refine
 create{Read|Write}OrMasked{Read|Write} (nfc)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The semantics of `createReadOrMaskedRead` and `createWriteOrMaskedWrite`
are currently a bit inconsistent and not fully documented:

* The input vector sizes are passed as `readShape` and
  `inputVectorSizes`, respectively — inconsistent naming.

* Currently, the input vector sizes in `createWriteOrMaskedWrite` are
  not required to be complete: any missing trailing sizes are inferred
  from the destination tensor. This only works when the destination
  tensor is statically shaped.

* Unlike `createReadOrMaskedRead`, the documentation for
  `createWriteOrMaskedWrite` does not specify that write offsets are
  hard-coded to 0.

This PR only updates the documentation and unifies the naming. As such,
it is NFC.

A follow-up PR will generalize and unify the implementation to support,
for example, dynamically shaped destination tensors — a requirement for
enabling scalable vectorization of `linalg.pack` and `linalg.unpack`.
---
 .../mlir/Dialect/Vector/Utils/VectorUtils.h   |  25 ++---
 .../Linalg/Transforms/Vectorization.cpp       | 101 +++++++++++++-----
 mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp |  20 ++--
 3 files changed, 95 insertions(+), 51 deletions(-)

diff --git a/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h b/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
index 5f32aca88a273..6609b28d77b6c 100644
--- a/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
+++ b/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
@@ -211,21 +211,18 @@ struct MaskableOpRewritePattern : OpRewritePattern<SourceOp> {
 /// are not linearizable.
 bool isLinearizableVector(VectorType type);
 
-/// Create a TransferReadOp from `source` with static shape `readShape`. If the
-/// vector type for the read is not the same as the type of `source`, then a
-/// mask is created on the read, if use of mask is specified or the bounds on a
-/// dimension are different.
-///
-/// `useInBoundsInsteadOfMasking` if false, the inBoundsVal values are set
-/// properly, based on
-///   the rank dimensions of the source and destination tensors. And that is
-///   what determines if masking is done.
-///
-/// Note that the internal `vector::TransferReadOp` always read at indices zero
-/// for each dimension of the passed in tensor.
+/// Creates a TransferReadOp from `source`.
+///
+/// The shape of the vector to read is specified via `inputVectorSizes`. If the
+/// shape of the output vector differs from the shape of the value being read,
+/// masking is used to avoid out-of-bounds accesses. Set
+/// `useInBoundsInsteadOfMasking` to `true` to use the "in_bounds" attribute
+/// instead of explicit masks.
+///
+/// Note: all read offsets are set to 0.
 Value createReadOrMaskedRead(OpBuilder &builder, Location loc, Value source,
-                             ArrayRef<int64_t> readShape, Value padValue,
-                             bool useInBoundsInsteadOfMasking);
+                             ArrayRef<int64_t> inputVectorSizes, Value padValue,
+                             bool useInBoundsInsteadOfMasking = false);
 
 /// Returns success if `inputVectorSizes` is a valid masking configuraion for
 /// given `shape`, i.e., it meets:
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
index 8c8b1b85ef5a3..534a53ffe7b83 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
@@ -1506,19 +1506,48 @@ static SmallVector<int64_t> getTiledPackShape(linalg::PackOp packOp,
   return applyPermutation(destShape, linalg::getPackInverseDestPerm(packOp));
 }
 
-/// Given an input, the mixed destSizes, and the vector sizes for vectorization,
-/// create an empty destination tensor and create a TransferWriteOp from the
-/// input to the empty tensor. If the destination shape is not the same as the
-/// inputVectorSizes for the first rank(inputVectorSizes) dims, then create a
-/// mask for the write. If `useInBoundsInsteadOfMasking` is set, then update the
-/// inBounds attribute of the transfer write op instead of masking.
-static Operation *createWriteOrMaskedWrite(OpBuilder &builder, Location loc,
-                                           Value input,
-                                           SmallVector<OpFoldResult> destSizes,
-                                           ArrayRef<int64_t> inputVectorSizes,
-                                           bool useInBoundsInsteadOfMasking) {
+/// Creates a TransferWriteOp to write `input` into a newly initialized
+/// output tensor.
+///
+/// Given:
+/// - an input vector to write,
+/// - the mixed destination sizes for the output tensor,
+/// - and the vector sizes used for vectorization (i.e., the leading N dims),
+///
+/// this function generates the following sequence of ops:
+///
+///   %dest = tensor.empty(%destSizes)
+///   %res = vector.transfer_write %input into %dest
+///
+/// If the leading N dimensions of the destination tensor do not match
+/// `inputVecSizesForLeadingDims` (where N =
+/// rank(`inputVecSizesForLeadingDims`)), masking is applied to ensure
+/// correctness:
+///
+///   %dest = tensor.empty(%destSizes)
+///   %write = vector.transfer_write %input into %dest
+///   %mask = vector.create_mask(%destSizes)
+///   %res = vector.mask %mask { %write }
+///
+/// If `useInBoundsInsteadOfMasking` is set to `true`, the `in_bounds` attribute
+/// is used instead of masking:
+///
+///   %dest = tensor.empty(%destSizes)
+///   in_bounds_flags = (...)
+///   %res = vector.transfer_write %input into %dest
+///       {in_bounds = in_bounds_flags}
+///
+/// Note: all write offsets are set to 0.
+static Operation *
+createWriteOrMaskedWrite(OpBuilder &builder, Location loc, Value input,
+                         SmallVector<OpFoldResult> destSizes,
+                         ArrayRef<int64_t> inputVecSizesForLeadingDims,
+                         bool useInBoundsInsteadOfMasking = false) {
 
   auto inputType = cast<VectorType>(input.getType());
+  assert(inputType.getRank() == static_cast<int64_t>(destSizes.size()) &&
+         "Rank mismatch!");
+
   Value dest = builder.create<tensor::EmptyOp>(loc, destSizes,
                                                inputType.getElementType());
   int64_t rank = cast<ShapedType>(dest.getType()).getRank();
@@ -1526,9 +1555,13 @@ static Operation *createWriteOrMaskedWrite(OpBuilder &builder, Location loc,
   auto destShape = cast<ShapedType>(dest.getType()).getShape();
   SmallVector<bool> inBoundsVal(rank, true);
   if (useInBoundsInsteadOfMasking) {
+    // In this case, assume that all the required vector sizes have been
+    // provided.
+    assert(inputVecSizesForLeadingDims.size() == destSizes.size() &&
+           "Insufficient number of input vector sizes!");
     // Update the inBounds attribute.
     for (unsigned i = 0; i < rank; i++)
-      inBoundsVal[i] = (destShape[i] == inputVectorSizes[i]) &&
+      inBoundsVal[i] = (destShape[i] == inputVecSizesForLeadingDims[i]) &&
                        !ShapedType::isDynamic(destShape[i]);
   }
   Operation *write = builder.create<vector::TransferWriteOp>(
@@ -1538,17 +1571,20 @@ static Operation *createWriteOrMaskedWrite(OpBuilder &builder, Location loc,
       /*indices=*/SmallVector<Value>(rank, zero),
       /*inBounds=*/inBoundsVal);
   assert(llvm::none_of(
-             destShape.drop_front(inputVectorSizes.size()),
+             destShape.drop_front(inputVecSizesForLeadingDims.size()),
              [](int64_t size) { return size == ShapedType::kDynamic; }) &&
-         "Only dims aligned with inputVectorSizes may be dynamic");
+         "Only dims aligned with inputVecSizesForLeadingDims may be dynamic");
   if (useInBoundsInsteadOfMasking)
     return write;
-  bool needMaskForWrite = !llvm::equal(
-      inputVectorSizes, destShape.take_front(inputVectorSizes.size()));
+  bool needMaskForWrite =
+      !llvm::equal(inputVecSizesForLeadingDims,
+                   destShape.take_front(inputVecSizesForLeadingDims.size()));
   if (needMaskForWrite) {
     SmallVector<int64_t> writeMaskShape;
-    writeMaskShape.append(inputVectorSizes.begin(), inputVectorSizes.end());
-    writeMaskShape.append(destShape.begin() + inputVectorSizes.size(),
+    writeMaskShape.append(inputVecSizesForLeadingDims.begin(),
+                          inputVecSizesForLeadingDims.end());
+    writeMaskShape.append(destShape.begin() +
+                              inputVecSizesForLeadingDims.size(),
                           destShape.end());
     auto writeMaskType = VectorType::get(writeMaskShape, builder.getI1Type());
     Value maskForWrite =
@@ -1558,9 +1594,11 @@ static Operation *createWriteOrMaskedWrite(OpBuilder &builder, Location loc,
   return write;
 }
 
-/// Vectorize linalg::PackOp with (1) static innerTiles (2) constant
+/// Vectorize linalg::PackOp with (1) static inner_tiles (2) constant
 /// padding value and (3) input vector sizes into:
-/// masked_transfer_read->shape_cast->transpose->transfer_write_in_bounds
+///
+///   masked_transfer_read->shape_cast->transpose->transfer_write_in_bounds
+///
 /// As in the following example:
 /// %pack = tensor.pack %src inner_dims_pos = [2, 1] inner_tiles = [16, 2]
 ///     into %dst : tensor<32x8x16xf32> -> tensor<32x4x1x16x2xf32>
@@ -1582,8 +1620,11 @@ static Operation *createWriteOrMaskedWrite(OpBuilder &builder, Location loc,
 ///     : vector<32x4x1x16x2xf32>, tensor<32x4x1x16x2xf32>
 ///
 /// If the (3) input vector sizes are not provided, the vector sizes are
-/// determined by the result tensor shape. Also, we update the inBounds
-/// attribute instead of masking.
+/// determined by the result tensor shape and the `in_bounds`
+/// attribute is used instead of masking to mark
+///
+/// NOTE: The input vector sizes specify the dimensions corresponding to the
+/// outer dimensions of the output tensor. The remaining dimensions are
 static LogicalResult
 vectorizeAsTensorPackOp(RewriterBase &rewriter, linalg::PackOp packOp,
                         ArrayRef<int64_t> inputVectorSizes,
@@ -1644,9 +1685,11 @@ vectorizeAsTensorPackOp(RewriterBase &rewriter, linalg::PackOp packOp,
       loc, shapeCastOp.getResult(), destPermutation);
 
   // Create TransferWriteOp.
-  Operation *write = createWriteOrMaskedWrite(
-      rewriter, loc, transposeOp.getResult(), reifiedReturnShapes[0],
-      inputVectorSizes, /*useInBoundsInsteadOfMasking=*/false);
+  Operation *write =
+      createWriteOrMaskedWrite(rewriter, loc, transposeOp.getResult(),
+                               /*destSizes=*/reifiedReturnShapes[0],
+                               /*inputVecSizesForLeadingDims=*/inputVectorSizes,
+                               /*useInBoundsInsteadOfMasking=*/false);
   newResults.push_back(write->getResult(0));
   return success();
 }
@@ -1780,8 +1823,9 @@ vectorizeAsTensorUnpackOp(RewriterBase &rewriter, linalg::UnPackOp unpackOp,
           ? vectorSizes
           : shapeCastOp.getResultVectorType().getShape());
   Operation *write = createWriteOrMaskedWrite(
-      rewriter, loc, shapeCastOp.getResult(), reifiedRetShapes[0],
-      writeVectorSizes, useInBoundsInsteadOfMasking);
+      rewriter, loc, shapeCastOp.getResult(), /*destSizes=*/reifiedRetShapes[0],
+      /*inputVecSizesForLeadingDims=*/writeVectorSizes,
+      useInBoundsInsteadOfMasking);
   newResults.push_back(write->getResult(0));
   return success();
 }
@@ -1810,7 +1854,8 @@ vectorizeAsTensorPadOp(RewriterBase &rewriter, tensor::PadOp padOp,
       rewriter, loc, padOp.getSource(), inputVectorSizes, padValue,
       /*useInBoundsInsteadOfMasking=*/false);
   Operation *write = createWriteOrMaskedWrite(
-      rewriter, loc, maskedRead, reifiedReturnShapes[0], inputVectorSizes,
+      rewriter, loc, maskedRead, reifiedReturnShapes[0],
+      /*inputVecSizesForLeadingDims=*/inputVectorSizes,
       /*useInBoundsInsteadOfMasking=*/false);
   newResults.push_back(write->getResult(0));
   return success();
diff --git a/mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp b/mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp
index 7b56cd0cf0e91..024e8ccb901de 100644
--- a/mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp
+++ b/mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp
@@ -327,26 +327,28 @@ bool vector::isLinearizableVector(VectorType type) {
 }
 
 Value vector::createReadOrMaskedRead(OpBuilder &builder, Location loc,
-                                     Value source, ArrayRef<int64_t> readShape,
+                                     Value source,
+                                     ArrayRef<int64_t> inputVectorSizes,
                                      Value padValue,
                                      bool useInBoundsInsteadOfMasking) {
-  assert(llvm::none_of(readShape,
+  assert(llvm::none_of(inputVectorSizes,
                        [](int64_t s) { return s == ShapedType::kDynamic; }) &&
-         "expected static shape");
+         "invalid input vector sizes");
   auto sourceShapedType = cast<ShapedType>(source.getType());
   auto sourceShape = sourceShapedType.getShape();
-  assert(sourceShape.size() == readShape.size() && "expected same ranks.");
-  auto maskType = VectorType::get(readShape, builder.getI1Type());
-  auto vectorType = VectorType::get(readShape, padValue.getType());
+  assert(sourceShape.size() == inputVectorSizes.size() &&
+         "expected same ranks.");
+  auto maskType = VectorType::get(inputVectorSizes, builder.getI1Type());
+  auto vectorType = VectorType::get(inputVectorSizes, padValue.getType());
   assert(padValue.getType() == sourceShapedType.getElementType() &&
          "expected same pad element type to match source element type");
-  int64_t readRank = readShape.size();
+  int64_t readRank = inputVectorSizes.size();
   auto zero = builder.create<arith::ConstantIndexOp>(loc, 0);
   SmallVector<bool> inBoundsVal(readRank, true);
   if (useInBoundsInsteadOfMasking) {
     // Update the inBounds attribute.
     for (unsigned i = 0; i < readRank; i++)
-      inBoundsVal[i] = (sourceShape[i] == readShape[i]) &&
+      inBoundsVal[i] = (sourceShape[i] == inputVectorSizes[i]) &&
                        !ShapedType::isDynamic(sourceShape[i]);
   }
   auto transferReadOp = builder.create<vector::TransferReadOp>(
@@ -357,7 +359,7 @@ Value vector::createReadOrMaskedRead(OpBuilder &builder, Location loc,
       /*padding=*/padValue,
       /*inBounds=*/inBoundsVal);
 
-  if (llvm::equal(readShape, sourceShape) || useInBoundsInsteadOfMasking)
+  if (llvm::equal(inputVectorSizes, sourceShape) || useInBoundsInsteadOfMasking)
     return transferReadOp;
   SmallVector<OpFoldResult> mixedSourceDims =
       tensor::getMixedSizes(builder, loc, source);

>From 5aa309c12a927da7bf8816a7b7efd2bb6b6cd123 Mon Sep 17 00:00:00 2001
From: Andrzej Warzynski <andrzej.warzynski at arm.com>
Date: Fri, 11 Apr 2025 16:26:09 +0100
Subject: [PATCH 2/2] fixup! [mlir][linalg][vector] Refine
 create{Read|Write}OrMasked{Read|Write} (nfc)

Add missing comments
---
 mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
index 534a53ffe7b83..2afc99794ad0c 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
@@ -1512,7 +1512,8 @@ static SmallVector<int64_t> getTiledPackShape(linalg::PackOp packOp,
 /// Given:
 /// - an input vector to write,
 /// - the mixed destination sizes for the output tensor,
-/// - and the vector sizes used for vectorization (i.e., the leading N dims),
+/// - and the vector sizes used for vectorization (i.e., the leading N dims,
+///   for some value of N),
 ///
 /// this function generates the following sequence of ops:
 ///
@@ -1537,7 +1538,11 @@ static SmallVector<int64_t> getTiledPackShape(linalg::PackOp packOp,
 ///   %res = vector.transfer_write %input into %dest
 ///       {in_bounds = in_bounds_flags}
 ///
-/// Note: all write offsets are set to 0.
+/// NOTE: all write offsets are set to 0.
+/// NOTE: When N < rank(input), the missing vector sizes are effectively
+/// extracted from the trailing sizes of `destSizes`. This means those sizes
+/// must be static. Supporting dynamic sizes will require the user to specify
+/// the remaining vector sizes. This is left as a TODO.
 static Operation *
 createWriteOrMaskedWrite(OpBuilder &builder, Location loc, Value input,
                          SmallVector<OpFoldResult> destSizes,
@@ -1625,6 +1630,9 @@ createWriteOrMaskedWrite(OpBuilder &builder, Location loc, Value input,
 ///
 /// NOTE: The input vector sizes specify the dimensions corresponding to the
 /// outer dimensions of the output tensor. The remaining dimensions are
+/// computed based on, e.g., the static inner tiles.
+/// Supporting dynamic inner tiles will require the user to specify the
+/// missing vector sizes. This is left as a TODO.
 static LogicalResult
 vectorizeAsTensorPackOp(RewriterBase &rewriter, linalg::PackOp packOp,
                         ArrayRef<int64_t> inputVectorSizes,