[Mlir-commits] [mlir] [mlir][vector] Add vector.transpose with unit-dim to vector.shape_cast pattern (PR #72105)

Quinn Dawkins llvmlistbot at llvm.org
Wed Nov 22 06:22:49 PST 2023


qedawkins wrote:

> I also look at it the other way, "what is the more canonical form that performs the exact same operation?":
> 
>     1. `%1 = vector.transpose %0 [1, 0] : vector<1x4xf32> to vector<4x1xf32>`
> 
>     2. `%1 = vector.shape_cast : vector<1x4xf32> to vector<4x1xf32>`
> 
> 
> Do you believe that 1 is more canonical than 2? Considering that `vector.shape_cast` is an operation that is more restrictive than `vector.transpose` and provide better guarantees (no data movement), I would wage that 2) is a winner here.

Just to play the other side of the coin here, I see the information in `vector.transpose` as still potentially useful, even when just unit dimensions are being transposed. Recent discussions about layout analysis (and subsequent distribution) of vectors benefit from the extra permutation information. For example

```mlir
%1 = vector.transfer_read %0 [%c0, %c0], %cst0 : memref<32x1xf32>, vector<32x1xf32>
%2 = vector.transpose %1 [1, 0] : vector<32x1xf32> to vector<1x32xf32>
```

If we represent the layout with an affine map, we might start with a map of `(d0, d1) -> (d1, d0)` (`d0` and `d1` representing virtual thread dimensions of `1` and `32` threads respectively) on the transfer_read. Then, when trying to propagate that through the transpose, we simply use the permutation on the transpose.

```mlir
%1 = vector.transfer_read %0 [%c0, %c0], %cst0 : memref<32x1xf32>, vector<32x1xf32> # (d0, d1) -> (d1, d0)
%2 = vector.transpose %1 [1, 0] : vector<32x1xf32> to vector<1x32xf32>              # (d0, d1) -> (d0, d1)
```

With shape_cast, without a better way to directly relate the dimensions of the new and old vector to one another, we end up linearizing + delinearizing to the new shape.

```mlir
%1 = vector.transfer_read %0 [%c0, %c0], %cst0 : memref<32x1xf32>, vector<32x1xf32> # (d0, d1) -> (d1, d0)
%2 = vector.shape_cast %1: vector<32x1xf32> to vector<1x32xf32>                     # (d0, d1) -> ((1 * d1 + d0) floordiv 32, (1 * d1 + d0) % 32)
```

The map could be simplified here based on the known dim sizes, but it's not clear to me that it will always be easy to simplify to the same as what transpose gives in the context of larger IR + more complex layouts. Even here, simplification of this map might have trouble differentiating between an "important" and "unimportant" use of `d0` (e.g. it might simplify to `(d0, d1) -> `(0, d1)`; maybe this is a more canonical map anyway, but it forces that decision).

That said, this is just one _downstream_ interpretation + analysis of vectors and maybe shouldn't be used as a case _upstream_ has to consider, but I figured it's relevant given the recent discussion around this kind of layout analysis.

https://github.com/llvm/llvm-project/pull/72105


More information about the Mlir-commits mailing list