[Mlir-commits] [mlir] [MLIR] Folding unpack and pack sequence in data layout propagation (PR #138332)

Mon May 5 07:41:35 PDT 2025

jerryyin wrote:

I probably should have added the test case as a motivation of the PR. I'm adding it now as a comment to illustrate @Max191's point. Hopefully this help clarify things better.

The motivating example is around the `PushDownUnPackOpThroughGenericOp`. The incoming IR looks like:
```mlir
%unpack = linalg.unpack %19 inner_dims_pos = [0, 1] inner_tiles = [16, 16] into %extracted_slice_5 : tensor<4x8x16x16xf32> -> tensor<?x128xf32>
%extracted_slice_6 = tensor.extract_slice %arg2[%arg0, %arg1] [%7, 128] [1, 1] : tensor<10738x896xbf16> to tensor<?x128xbf16>
%20 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%unpack : tensor<?x128xf32>) outs(%extracted_slice_6 : tensor<?x128xbf16>) {
^bb0(%in: f32, %out: bf16):
  %21 = arith.truncf %in : f32 to bf16
  linalg.yield %21 : bf16
} -> tensor<?x128xbf16>
```

Please note that the `%19` is in a padded domain, so there's an implicit extract_slice associated with unpack. In this example, without knowing specifically what `linalg.generic` does, we have no choice but to re-pack the %unpack into the padded domain when pushing down the unpack op. 

Think of the counter example the will make the result wrong if we prematurely carried this optimization:
```mlir
  ^bb0(%in: f32, %out: f32):
    %21 = arith.addf %in, %out : f32
    linalg.yield %21 : f32
  } -> tensor<?x128xf32>
```
In this example, since %out matters in the `linalg.generic` compute result, and we still forcefully push down the unpack without re-packing the %unpack. Then the padded values will be used as %out and may alter the result just because we carried out compute using the padded values! Therefore, looking into what `linalg.generic` does is the key of making this PR correct. And as Max pointed out, it'd be much cleaner to put this minimal add-on into the data layout propagation compared with a canonicalization pattern. I'd like to thank @hanhanW for pointing out that canonicalization pattern though. I didn't know it exists until reading the review comments!

https://github.com/llvm/llvm-project/pull/138332