[Mlir-commits] [mlir] [MLIR] Folding unpack and pack sequence in data layout propagation (PR #138332)
Zhuoran Yin
llvmlistbot at llvm.org
Mon May 5 07:41:35 PDT 2025
jerryyin wrote:
I probably should have added the test case as a motivation of the PR. I'm adding it now as a comment to illustrate @Max191's point. Hopefully this help clarify things better.
The motivating example is around the `PushDownUnPackOpThroughGenericOp`. The incoming IR looks like:
```mlir
%unpack = linalg.unpack %19 inner_dims_pos = [0, 1] inner_tiles = [16, 16] into %extracted_slice_5 : tensor<4x8x16x16xf32> -> tensor<?x128xf32>
%extracted_slice_6 = tensor.extract_slice %arg2[%arg0, %arg1] [%7, 128] [1, 1] : tensor<10738x896xbf16> to tensor<?x128xbf16>
%20 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%unpack : tensor<?x128xf32>) outs(%extracted_slice_6 : tensor<?x128xbf16>) {
^bb0(%in: f32, %out: bf16):
%21 = arith.truncf %in : f32 to bf16
linalg.yield %21 : bf16
} -> tensor<?x128xbf16>
```
Please note that the `%19` is in a padded domain, so there's an implicit extract_slice associated with unpack. In this example, without knowing specifically what `linalg.generic` does, we have no choice but to re-pack the %unpack into the padded domain when pushing down the unpack op.
Think of the counter example the will make the result wrong if we prematurely carried this optimization:
```mlir
^bb0(%in: f32, %out: f32):
%21 = arith.addf %in, %out : f32
linalg.yield %21 : f32
} -> tensor<?x128xf32>
```
In this example, since %out matters in the `linalg.generic` compute result, and we still forcefully push down the unpack without re-packing the %unpack. Then the padded values will be used as %out and may alter the result just because we carried out compute using the padded values! Therefore, looking into what `linalg.generic` does is the key of making this PR correct. And as Max pointed out, it'd be much cleaner to put this minimal add-on into the data layout propagation compared with a canonicalization pattern. I'd like to thank @hanhanW for pointing out that canonicalization pattern though. I didn't know it exists until reading the review comments!
https://github.com/llvm/llvm-project/pull/138332
More information about the Mlir-commits
mailing list