[Mlir-commits] [mlir] [mlir][Vector] add vector.insert canonicalization pattern to convert a chain of insertions to vector.from_elements (PR #142944)
Adrian Kuegel
llvmlistbot at llvm.org
Thu Aug 21 05:50:44 PDT 2025
akuegel wrote:
It seems this PR is triggering another canonicalization pattern that was added in https://github.com/llvm/llvm-project/commit/f3cc8543647cfcfd3ea383e6738bc58a258d6f74
Let's take this IR:
```
func.func @wrapped_bitcast_convert(%arg0: tensor<2xi4>, %arg1: tensor<2xi4>) -> tensor<2xi4> {
%c0 = arith.constant 0 : index
%cst = arith.constant dense<0> : vector<2xi4>
%0 = ub.poison : i4
%1 = vector.transfer_read %arg0[%c0], %0 {in_bounds = [true]} : tensor<2xi4>, vector<2xi4>
%2 = vector.extract %1[0] : i4 from vector<2xi4>
%3 = vector.insert %2, %cst [0] : i4 into vector<2xi4>
%4 = vector.extract %1[1] : i4 from vector<2xi4>
%5 = vector.insert %4, %3 [1] : i4 into vector<2xi4>
%6 = vector.transfer_write %5, %arg1[%c0] {in_bounds = [true]} : vector<2xi4>, tensor<2xi4>
return %6 : tensor<2xi4>
}
```
This gets now canonicalized to:
```
func.func @wrapped_bitcast_convert(%arg0: tensor<2xi4>, %arg1: tensor<2xi4>) -> tensor<2xi4> {
%c0 = arith.constant 0 : index
%0 = ub.poison : i4
%1 = vector.transfer_read %arg0[%c0], %0 {in_bounds = [true]} : tensor<2xi4>, vector<2xi4>
%2 = vector.transfer_write %1, %arg1[%c0] {in_bounds = [true]} : vector<2xi4>, tensor<2xi4>
return %2 : tensor<2xi4>
}
```
And then the other pattern applies and folds this into returning %arg0. I believe that pattern is wrong and should only be applied if the base argument is the same (so that we read and write to/from the same memory). Given this PR only triggers this potential bug, I guess I will have to look into fixing this myself.
Another issue I noticed is that apparently there are some GPU related tests that do not run by default which are still broken. These are the tests:
mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f16-f16-accum.mlir
mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f32.mlir
Most likely the GPU related pipeline that they use needs some adjustment to handle vector::FromElementOp with rank > 1.
https://github.com/llvm/llvm-project/pull/142944
More information about the Mlir-commits
mailing list