[flang-commits] [compiler-rt] [flang] [libc] [llvm] [clang-tools-extra] [libcxx] [clang] [mlir] [mlir][Linalg] Support dynamic tiles in `lower_pack` transform (PR #76003)

Fri Jan 5 08:40:13 PST 2024

srcarroll wrote:

> It would be easy enough for me to change what I have to only do expand and only match fail on completely impossible (with `expand_shape`) cases.

After thinking about it more, if I'm not mistaken, the current implementation already covers all possible cases with `expand_shape`, but with a caveat.  I'll illustrate with this example 
```
func.func @pack_dyn_tiles(%arg0: tensor<64x128xf32>, %arg1: tensor<4x8x?x?xf32>, %tile_0: index, %tile_1: index) -> tensor<4x8x?x?xf32> {
  %pack = tensor.pack %arg0 inner_dims_pos = [0, 1] inner_tiles = [%tile_0, %tile_1] into %arg1
    : tensor<64x128xf32> -> tensor<4x8x?x?xf32>
  return %pack : tensor<4x8x?x?xf32>
}
```
The current implementation does not handle this case because it unconditionally match fails when any tile sizes is dynamic.  I can make changes on the match failure condition to allow this case with `expand_shape` and would result in
```
  func.func @pack_dyn_tiles(%arg0: tensor<64x128xf32>, %arg1: tensor<4x8x?x?xf32>, %arg2: index, %arg3: index) -> tensor<4x8x?x?xf32> {
    %0 = affine.apply #map()[%arg2]
    %1 = affine.apply #map1()[%arg3]
    %cst = arith.constant 0.000000e+00 : f32
    %padded = tensor.pad %arg0 low[0, 0] high[%0, %1] {
    ^bb0(%arg4: index, %arg5: index):
      tensor.yield %cst : f32
    } : tensor<64x128xf32> to tensor<?x?xf32>
    %expanded = tensor.expand_shape %padded [[0, 1], [2, 3]] : tensor<?x?xf32> into tensor<4x?x8x?xf32>
    %transposed = linalg.transpose ins(%expanded : tensor<4x?x8x?xf32>) outs(%arg1 : tensor<4x8x?x?xf32>) permutation = [0, 2, 1, 3] 
    return %transposed : tensor<4x8x?x?xf32>
  }
  ```
However, in this example the tile sizes can be inferred by the relationship between input and output sizes, so they might as well be static (I think you eluded to this in one of your comments).

So I don't think there are any non-trivial cases left to handle dynamic tiles while keeping `expand_shape`.

Questions: 
1. Should the verifier allow such an example?  In other words, should users be required to use static tile sizes in the cases where their static values can be inferred?
2. If we allow, should there be a canonicalizer implementation to infer the static tile values?
3. Should I just go ahead and allow dynamic tiles for this case in this lowering?

https://github.com/llvm/llvm-project/pull/76003