[Mlir-commits] [mlir] [mlir] Fix padding shape computation in PadTilingInterface (PR #149576)
Fabian Mora
llvmlistbot at llvm.org
Mon Jul 21 06:46:42 PDT 2025
https://github.com/fabianmcg requested changes to this pull request.
Thank you for spotting possible corner cases, we are aware of some weird consequences of padding the iteration domain rather than the in/outs, but I'm unsure if this falls into that. Could you provide a more ample explanation why your fix works?
Also, I think your example is not valid:
```mlir
func.func @conv_2d_nhwc_fhwc(%arg0: tensor<1x16x16x4xf32>, %arg1: tensor<16x3x3x4xf32>, %arg2: tensor<1x14x14x16xf32>) -> tensor<1x14x14x16xf32> {
%0 = linalg.conv_2d_nhwc_fhwc
{dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64> }
ins(%arg0, %arg1: tensor<1x16x16x4xf32>, tensor<16x3x3x4xf32>)
outs(%arg2: tensor<1x14x14x16xf32>) -> tensor<1x14x14x16xf32>
return %0 : tensor<1x14x14x16xf32>
}
```
Because when you transform to loops you can see there's an oob access (see the first memref.load), and to me that's a verification error:
```mlir
#map = affine_map<(d0, d1) -> (d0 + d1)>
func.func @conv_2d_nhwc_fhwc(%arg0: tensor<1x16x16x4xf32>, %arg1: tensor<16x3x3x4xf32>, %arg2: tensor<1x14x14x16xf32>) -> tensor<1x14x14x16xf32> {
%c4 = arith.constant 4 : index
%c3 = arith.constant 3 : index
%c16 = arith.constant 16 : index
%c14 = arith.constant 14 : index
%c1 = arith.constant 1 : index
%c0 = arith.constant 0 : index
%0 = bufferization.to_buffer %arg1 : tensor<16x3x3x4xf32> to memref<16x3x3x4xf32, strided<[?, ?, ?, ?], offset: ?>>
%1 = bufferization.to_buffer %arg0 : tensor<1x16x16x4xf32> to memref<1x16x16x4xf32, strided<[?, ?, ?, ?], offset: ?>>
%2 = bufferization.to_buffer %arg2 : tensor<1x14x14x16xf32> to memref<1x14x14x16xf32, strided<[?, ?, ?, ?], offset: ?>>
%alloc = memref.alloc() {alignment = 64 : i64} : memref<1x14x14x16xf32>
memref.copy %2, %alloc : memref<1x14x14x16xf32, strided<[?, ?, ?, ?], offset: ?>> to memref<1x14x14x16xf32>
scf.for %arg3 = %c0 to %c1 step %c1 {
scf.for %arg4 = %c0 to %c14 step %c1 {
scf.for %arg5 = %c0 to %c14 step %c1 {
scf.for %arg6 = %c0 to %c16 step %c1 {
scf.for %arg7 = %c0 to %c3 step %c1 {
scf.for %arg8 = %c0 to %c3 step %c1 {
scf.for %arg9 = %c0 to %c4 step %c1 {
%4 = affine.apply #map(%arg4, %arg7)
%5 = affine.apply #map(%arg5, %arg8)
%6 = memref.load %1[%arg3, %4, %5, %arg9] : memref<1x16x16x4xf32, strided<[?, ?, ?, ?], offset: ?>>
%7 = memref.load %0[%arg6, %arg7, %arg8, %arg9] : memref<16x3x3x4xf32, strided<[?, ?, ?, ?], offset: ?>>
%8 = memref.load %alloc[%arg3, %arg4, %arg5, %arg6] : memref<1x14x14x16xf32>
%9 = arith.mulf %6, %7 : f32
%10 = arith.addf %8, %9 : f32
memref.store %10, %alloc[%arg3, %arg4, %arg5, %arg6] : memref<1x14x14x16xf32>
}
}
}
}
}
}
}
%3 = bufferization.to_tensor %alloc : memref<1x14x14x16xf32> to tensor<1x14x14x16xf32>
return %3 : tensor<1x14x14x16xf32>
}
```
https://github.com/llvm/llvm-project/pull/149576
More information about the Mlir-commits
mailing list