[Mlir-commits] [mlir] [mlir] Fix padding shape computation in PadTilingInterface (PR #149576)

Mon Jul 21 06:46:42 PDT 2025

https://github.com/fabianmcg requested changes to this pull request.

Thank you for spotting possible corner cases, we are aware of some weird consequences of padding the iteration domain rather than the in/outs, but I'm unsure if this falls into that. Could you provide a more ample explanation why your fix works?

Also, I think your example is not valid:
```mlir
func.func @conv_2d_nhwc_fhwc(%arg0: tensor<1x16x16x4xf32>, %arg1: tensor<16x3x3x4xf32>, %arg2: tensor<1x14x14x16xf32>) -> tensor<1x14x14x16xf32> {
    %0 = linalg.conv_2d_nhwc_fhwc
      {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64> }
       ins(%arg0, %arg1: tensor<1x16x16x4xf32>, tensor<16x3x3x4xf32>)
      outs(%arg2: tensor<1x14x14x16xf32>) -> tensor<1x14x14x16xf32>
    return %0 : tensor<1x14x14x16xf32>
}
```

Because when you transform to loops you can see there's an oob access (see the first memref.load), and to me that's a verification error:

```mlir
#map = affine_map<(d0, d1) -> (d0 + d1)>
func.func @conv_2d_nhwc_fhwc(%arg0: tensor<1x16x16x4xf32>, %arg1: tensor<16x3x3x4xf32>, %arg2: tensor<1x14x14x16xf32>) -> tensor<1x14x14x16xf32> {
  %c4 = arith.constant 4 : index
  %c3 = arith.constant 3 : index
  %c16 = arith.constant 16 : index
  %c14 = arith.constant 14 : index
  %c1 = arith.constant 1 : index
  %c0 = arith.constant 0 : index
  %0 = bufferization.to_buffer %arg1 : tensor<16x3x3x4xf32> to memref<16x3x3x4xf32, strided<[?, ?, ?, ?], offset: ?>>
  %1 = bufferization.to_buffer %arg0 : tensor<1x16x16x4xf32> to memref<1x16x16x4xf32, strided<[?, ?, ?, ?], offset: ?>>
  %2 = bufferization.to_buffer %arg2 : tensor<1x14x14x16xf32> to memref<1x14x14x16xf32, strided<[?, ?, ?, ?], offset: ?>>
  %alloc = memref.alloc() {alignment = 64 : i64} : memref<1x14x14x16xf32>
  memref.copy %2, %alloc : memref<1x14x14x16xf32, strided<[?, ?, ?, ?], offset: ?>> to memref<1x14x14x16xf32>
  scf.for %arg3 = %c0 to %c1 step %c1 {
    scf.for %arg4 = %c0 to %c14 step %c1 {
      scf.for %arg5 = %c0 to %c14 step %c1 {
        scf.for %arg6 = %c0 to %c16 step %c1 {
          scf.for %arg7 = %c0 to %c3 step %c1 {
            scf.for %arg8 = %c0 to %c3 step %c1 {
              scf.for %arg9 = %c0 to %c4 step %c1 {
                %4 = affine.apply #map(%arg4, %arg7)
                %5 = affine.apply #map(%arg5, %arg8)
                %6 = memref.load %1[%arg3, %4, %5, %arg9] : memref<1x16x16x4xf32, strided<[?, ?, ?, ?], offset: ?>>
                %7 = memref.load %0[%arg6, %arg7, %arg8, %arg9] : memref<16x3x3x4xf32, strided<[?, ?, ?, ?], offset: ?>>
                %8 = memref.load %alloc[%arg3, %arg4, %arg5, %arg6] : memref<1x14x14x16xf32>
                %9 = arith.mulf %6, %7 : f32
                %10 = arith.addf %8, %9 : f32
                memref.store %10, %alloc[%arg3, %arg4, %arg5, %arg6] : memref<1x14x14x16xf32>
              }
            }
          }
        }
      }
    }
  }
  %3 = bufferization.to_tensor %alloc : memref<1x14x14x16xf32> to tensor<1x14x14x16xf32>
  return %3 : tensor<1x14x14x16xf32>
}
```

https://github.com/llvm/llvm-project/pull/149576