[Mlir-commits] [mlir] [mlir][vector] Make `TransposeOpLowering` configurable (PR #73915)

Thu Nov 30 02:17:12 PST 2023

MacDue wrote:

> We could also implement this lowering using a `vector.extract` + `vector.insert` for the time being. I would rather do that than opening the door to diverging lowerings or excluding ops for specific backends.

No there's no way to represent this with vector.extract/insert, due to scalability it'd need a loop. Like:
```
for i in range (0, 4 x vscale):
 %el = vector.extract %src[i, 0] : f32 from vector<[4]x1xf32>
 vector.insert %el, %result[0, i] : f32 into vector<1x[4]xf32>
```

For the issue we're trying to solve though it looks like we could add a fold `transpose(shape_cast)` to `shape_cast`. As we're seeing:
```
%11 = vector.transfer_read %subview_5[%c0_6], %cst, %10 {in_bounds = [true]} : memref<?xf32, affine_map<(d0)[s0, s1] -> (d0 * s1 + s0)>>, vector<[4]xf32>
%12 = vector.shape_cast %11 : vector<[4]xf32> to vector<[4]x1xf32>
%19 = vector.transpose %12, [1, 0] : vector<[4]x1xf32> to vector<1x[4]xf32>
```
Which could fold to:
```
%11 = vector.transfer_read %subview_5[%c0_6], %cst, %10 {in_bounds = [true]} : memref<?xf32, affine_map<(d0)[s0, s1] -> (d0 * s1 + s0)>>, vector<[4]xf32>
%12 = vector.shape_cast %11 : vector<[4]xf32> to vector<1x[4]xf32>
```

https://github.com/llvm/llvm-project/pull/73915