[llvm-branch-commits] [mlir] [mlir][linalg] Enable scalable vectorization of linalg.unpack (PR #149293)

Andrzej WarzyƄski via llvm-branch-commits llvm-branch-commits at lists.llvm.org
Wed Jul 30 07:47:29 PDT 2025


banach-space wrote:

**UPDATE: 30/7/25**

* This [commit](https://github.com/llvm/llvm-project/pull/149293/commits/56108b1df69e150c475adc58880ca7dce5355b21) addresses the remaining comments from @hanhanW . 
* I have rebased this PR on top of https://github.com/llvm/llvm-project/pull/151334. This rebase addresses this [comment](https://github.com/llvm/llvm-project/pull/149293#discussion_r2237499014) from @egebeysel .

**GENERAL OBSERVATIONS + FUTURE STEPS**

Having implemented #151334, I now realise that we don't require separate vector sizes for the _write_ operation (there's a small twist though).

To illustrate, take this example:
```mlir
func.func @example(%source: tensor<8x4x16x16xf32>, %dest: tensor<64x127xf32>) -> tensor<64x127xf32> {
   %0 = linalg.unpack %source outer_dims_perm = [1, 0] inner_dims_pos = [0, 1] inner_tiles = [16, 16] into %dest : tensor<8x4x16x16xf32> -> tensor<64x127xf32>
   return %0 : tensor<64x127xf32>
 }
```

It will be vectorized as:
```mlir
  func.func @example(%arg0: tensor<8x4x16x16xf32>, %arg1: tensor<64x127xf32>) -> tensor<64x127xf32> {
    %cst = arith.constant 0.000000e+00 : f32
    %c0 = arith.constant 0 : index
    // This is key - vec Op 1 !!!
    %0 = vector.transfer_read %arg0[%c0, %c0, %c0, %c0], %cst {in_bounds = [true, true, true, true]} : tensor<8x4x16x16xf32>, vector<8x4x16x16xf32>
    // This is key - vec Op 2 !!!
    %1 = vector.transpose %0, [1, 2, 0, 3] : vector<8x4x16x16xf32> to vector<4x16x8x16xf32>
    // This is key - vec Op 3 !!!
    %2 = vector.shape_cast %1 : vector<4x16x8x16xf32> to vector<64x128xf32>
    %c0_0 = arith.constant 0 : index
    // This is key - vec Op 4!!!
    %3 = vector.transfer_write %2, %arg1[%c0_0, %c0_0] {in_bounds = [true, false]} : vector<64x128xf32>, tensor<64x127xf32>
    return %3 : tensor<64x127xf32>
  }
```

Now, once we vectorize the read operation, the remaining sizes are already pre-determined (i.e. the sizes for the _write_ operation):
* For `vector.transpose`, the sizes must match the sizes from `vector.transfer_read` (% permutation).
* For `vector.shape_cast`, the input must match the output of `vector.transpose`. The output is uniquely determined by e.g. applying `outer_dims_perm` from `linalg.unpack` to the output from `vector.transpose`.
* For `vector.transfer_write`, we have to use the output shape from `vector.shape_cast`.

TL;Dr We should only require sizes for the _write_ operation.

**TWIST**

While we should be able to infer the scalable flags, there is some logic still missing. This should not be a problem though.

**NEXT STEPS**

While we could land this as is (IREE integration looks fine: https://github.com/iree-org/iree/pull/21514, thanks @hanhanW ) and then iterate in-tree, it might be "healthier" if there's one self-contained change. 

Let me refine this and then integrate into IREE (to make sure that the integration works). Also, @hanhanW , lets sync offline and make sure that switching to "only vector sizes for the read Op" is going to work for IREE.

WDYT?

https://github.com/llvm/llvm-project/pull/149293


More information about the llvm-branch-commits mailing list