[Mlir-commits] [mlir] 3f136f7 - [Tensor] Simplify tenor.pad tiling length calculations. (#119039)

Thu Dec 12 09:24:24 PST 2024

Author: Nirvedh Meshram
Date: 2024-12-12T11:24:20-06:00
New Revision: 3f136f7dfb41542c76c1b352544009bffbc399d2

URL: https://github.com/llvm/llvm-project/commit/3f136f7dfb41542c76c1b352544009bffbc399d2
DIFF: https://github.com/llvm/llvm-project/commit/3f136f7dfb41542c76c1b352544009bffbc399d2.diff

LOG: [Tensor] Simplify tenor.pad tiling length calculations. (#119039)

The current calculations calculate ending location of the new length and
then subtract the new offset from that location. It is possible to
directly calculate new length. Along with requiring less operations
(which can matter in dynamic case) this also has the advantage that the
values are upper bounded by length rather than source size which is more
friendly for range analysis. I believe the change is already being
tested by
`test/Dialect/Linalg/subtensor-of-padtensor.mlir` and
`test/Dialect/Linalg/tile-and-fuse-tensors.mlir`

---------

Signed-off-by: Nirvedh <nirvedh at gmail.com>

Added: 
    

Modified: 
    mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp

Removed: 
    


################################################################################
diff  --git a/mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp b/mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp
index 68c3d1cabb11cb..3caf93b1408df4 100644

--- a/mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp
+++ b/mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp
@@ -746,11 +746,6 @@ FailureOr<TilingResult> tensor::bubbleUpPadSlice(OpBuilder &b,
   Location loc = padOp->getLoc();
   AffineExpr dim0, dim1;
   bindDims(b.getContext(), dim0, dim1);
-  // Add two integers.
-  auto addMap = AffineMap::get(2, 0, {dim0 + dim1});
-  auto add = [&](OpFoldResult v1, OpFoldResult v2) {
-    return affine::makeComposedFoldedAffineApply(b, loc, addMap, {v1, v2});
-  };
   // Subtract two integers.
   auto subMap = AffineMap::get(2, 0, {dim0 - dim1});
   auto sub = [&](OpFoldResult v1, OpFoldResult v2) {
@@ -825,16 +820,14 @@ FailureOr<TilingResult> tensor::bubbleUpPadSlice(OpBuilder &b,
     // The original read could also have stopped in the high padding zone.
     // In that case, set the end positition of the read should be the end of
     // the source tensor. (Similar to newOffset.)
-    //
-    // endLoc = min(max(offset - low + length, 0), srcSize)
-    //
-    // The new ExtractSliceOp length is `endLoc - newOffset`.
-    //
-    // Optimization: If low = 0, then the formula can be simplified.
-    OpFoldResult endLoc =
-        hasLowPad ? min(max(add(sub(offset, low), length), zero), srcSize)
-                  : min(add(offset, length), srcSize);
-    OpFoldResult newLength = sub(endLoc, newOffset);
+    // srcSize - newOffset represents how much length we have available
+    // and length - newLow represents how much length we want at most.
+    // Note that there are many ways to order this indexing math to compute newLength, but we want to make sure that the final affine.min ops in the sequence are bounding the index to as small a value as possible. If ValueBoundsOpInterface is used, this calcuation will get upper bounds from the affine.min ops, so we want to use the smallest known value to set the bound at the end of the computation sequence. In this case, the index will be upper bounded by length - newLow.
+    OpFoldResult newLength = min(sub(srcSize, newOffset), sub(length, newLow));
+    // Optimization: If low = 0, then newLow = 0. then newLength >= 0 assuming
+    // length >= 0.
+    if (hasLowPad)
+      newLength = max(newLength, zero);
     newLengths.push_back(newLength);
 
     // Check if newLength is zero. In that case, no SubTensorOp should be