[llvm-branch-commits] [mlir] [mlir][linalg] Enable scalable vectorization of linalg.unpack (PR #149293)
Andrzej WarzyĆski via llvm-branch-commits
llvm-branch-commits at lists.llvm.org
Wed Jul 30 07:47:29 PDT 2025
banach-space wrote:
**UPDATE: 30/7/25**
* This [commit](https://github.com/llvm/llvm-project/pull/149293/commits/56108b1df69e150c475adc58880ca7dce5355b21) addresses the remaining comments from @hanhanW .
* I have rebased this PR on top of https://github.com/llvm/llvm-project/pull/151334. This rebase addresses this [comment](https://github.com/llvm/llvm-project/pull/149293#discussion_r2237499014) from @egebeysel .
**GENERAL OBSERVATIONS + FUTURE STEPS**
Having implemented #151334, I now realise that we don't require separate vector sizes for the _write_ operation (there's a small twist though).
To illustrate, take this example:
```mlir
func.func @example(%source: tensor<8x4x16x16xf32>, %dest: tensor<64x127xf32>) -> tensor<64x127xf32> {
%0 = linalg.unpack %source outer_dims_perm = [1, 0] inner_dims_pos = [0, 1] inner_tiles = [16, 16] into %dest : tensor<8x4x16x16xf32> -> tensor<64x127xf32>
return %0 : tensor<64x127xf32>
}
```
It will be vectorized as:
```mlir
func.func @example(%arg0: tensor<8x4x16x16xf32>, %arg1: tensor<64x127xf32>) -> tensor<64x127xf32> {
%cst = arith.constant 0.000000e+00 : f32
%c0 = arith.constant 0 : index
// This is key - vec Op 1 !!!
%0 = vector.transfer_read %arg0[%c0, %c0, %c0, %c0], %cst {in_bounds = [true, true, true, true]} : tensor<8x4x16x16xf32>, vector<8x4x16x16xf32>
// This is key - vec Op 2 !!!
%1 = vector.transpose %0, [1, 2, 0, 3] : vector<8x4x16x16xf32> to vector<4x16x8x16xf32>
// This is key - vec Op 3 !!!
%2 = vector.shape_cast %1 : vector<4x16x8x16xf32> to vector<64x128xf32>
%c0_0 = arith.constant 0 : index
// This is key - vec Op 4!!!
%3 = vector.transfer_write %2, %arg1[%c0_0, %c0_0] {in_bounds = [true, false]} : vector<64x128xf32>, tensor<64x127xf32>
return %3 : tensor<64x127xf32>
}
```
Now, once we vectorize the read operation, the remaining sizes are already pre-determined (i.e. the sizes for the _write_ operation):
* For `vector.transpose`, the sizes must match the sizes from `vector.transfer_read` (% permutation).
* For `vector.shape_cast`, the input must match the output of `vector.transpose`. The output is uniquely determined by e.g. applying `outer_dims_perm` from `linalg.unpack` to the output from `vector.transpose`.
* For `vector.transfer_write`, we have to use the output shape from `vector.shape_cast`.
TL;Dr We should only require sizes for the _write_ operation.
**TWIST**
While we should be able to infer the scalable flags, there is some logic still missing. This should not be a problem though.
**NEXT STEPS**
While we could land this as is (IREE integration looks fine: https://github.com/iree-org/iree/pull/21514, thanks @hanhanW ) and then iterate in-tree, it might be "healthier" if there's one self-contained change.
Let me refine this and then integrate into IREE (to make sure that the integration works). Also, @hanhanW , lets sync offline and make sure that switching to "only vector sizes for the read Op" is going to work for IREE.
WDYT?
https://github.com/llvm/llvm-project/pull/149293
More information about the llvm-branch-commits
mailing list