[Mlir-commits] [mlir] [MLIR] Add shape propagation through tensor.pad (PR #136681)

Mon Aug 11 04:10:30 PDT 2025

ita9naiwa wrote:

This transformation is common in ML workloads such as CNNs, where input tensors are padded (e.g., to match convolution kernel sizes) and then packed into smaller tiles for efficient Tensor Core or SIMD execution.
Layout changes like tensor.expand_shape may appear between padding and packing, blocking optimizations such as linalg-fold-padding.

Bubbling `tensor.pad` directly before `linalg.pack` restores adjacency, enabling the padding to be folded into the packing step and unlocking better tiling and vectorization.

## Example: Bubbling `tensor.pad` Before `linalg.pack`

### Before Bubbling

```mlir
func.func @fold_tensor_pad_with_expand_and_pack(%arg0: tensor<512x256x256xf32>)
    -> tensor<32x16x129x129x2x2xf32> {
  %c0 = arith.constant 0.0 : f32
  %producer = linalg.fill ins(%c0 : f32)
      outs(%arg0 : tensor<512x256x256xf32>)
      -> tensor<512x256x256xf32>

  %pad = tensor.pad %producer low[0, 1, 1] high[0, 1, 1] {
    ^bb0(%i: index, %j: index, %k: index):
      tensor.yield %c0 : f32
  } : tensor<512x256x256xf32> to tensor<512x258x258xf32>

  %reshape = tensor.expand_shape %pad [[0, 1], [2], [3]]
      output_shape [32, 16, 258, 258]
      : tensor<512x258x258xf32> into tensor<32x16x258x258xf32>

  // The pack is blocked: `pad` is not directly adjacent due to reshape
  %dest = tensor.empty() : tensor<32x16x129x129x2x2xf32>
  %packed = linalg.pack %reshape
      inner_dims_pos = [2, 3]
      inner_tiles = [2, 2]
      into %dest
      : tensor<32x16x258x258xf32> -> tensor<32x16x129x129x2x2xf32>

  return %packed : tensor<32x16x129x129x2x2xf32>
}
```

Here, tensor.pad is separated from linalg.pack by tensor.expand_shape,
so folding patterns such as linalg-fold-padding cannot match.

### After Bubbling

command: `mlir-opt input.mlir -test-linalg-elementwise-fusion-patterns=fuse-with-reshape-by-expansion`

```mlir
module {
  func.func @fold_tensor_pad_with_expand_and_pack(%arg0: tensor<512x256x256xf32>)
      -> tensor<32x16x129x129x2x2xf32> {
    %cst = arith.constant 0.000000e+00 : f32
    %expanded = tensor.expand_shape %arg0 [[0, 1], [2], [3]]
        output_shape [32, 16, 256, 256]
        : tensor<512x256x256xf32> into tensor<32x16x256x256xf32>
    %0 = linalg.fill ins(%cst : f32)
        outs(%expanded : tensor<32x16x256x256xf32>)
        -> tensor<32x16x256x256xf32>
    %padded = tensor.pad %0 low[0, 0, 1, 1] high[0, 0, 1, 1] {
      ^bb0(%arg1: index, %arg2: index, %arg3: index, %arg4: index):
        tensor.yield %cst : f32
    } : tensor<32x16x256x256xf32> to tensor<32x16x258x258xf32>
    %1 = tensor.empty() : tensor<32x16x129x129x2x2xf32>
    %pack = linalg.pack %padded
        inner_dims_pos = [2, 3]
        inner_tiles = [2, 2]
        into %1
        : tensor<32x16x258x258xf32> -> tensor<32x16x129x129x2x2xf32>
    return %pack : tensor<32x16x129x129x2x2xf32>
  }
}
```

Now tensor.pad is immediately before linalg.pack.
This enables fusion patterns (linalg-fold-padding) and other tiling/vectorization passes.

Benefits of Bubbling in This Case
1. Pad–Consumer Adjacency
  - Required for linalg-fold-padding and related optimizations.
  - Without bubbling, the reshape blocks the match.
2. Improved Downstream Codegen
  - linalg.pack can incorporate padding directly into the blocked layout.
  - Avoids materializing a fully padded intermediate tensor.

Summary:

Bubbling tensor.expand_shape above tensor.pad restores pad–consumer adjacency,
enabling pad folding into linalg.pack and improving tiling and vectorization opportunities.

https://github.com/llvm/llvm-project/pull/136681