[Mlir-commits] [mlir] [mlir][Vector] add vector.insert canonicalization pattern to convert a chain of insertions to vector.from_elements (PR #142944)

Thu Aug 21 05:50:44 PDT 2025

akuegel wrote:

It seems this PR is triggering another canonicalization pattern that was added in https://github.com/llvm/llvm-project/commit/f3cc8543647cfcfd3ea383e6738bc58a258d6f74

Let's take this IR:

```
func.func @wrapped_bitcast_convert(%arg0: tensor<2xi4>, %arg1: tensor<2xi4>) -> tensor<2xi4> {
  %c0 = arith.constant 0 : index
  %cst = arith.constant dense<0> : vector<2xi4>
  %0 = ub.poison : i4
  %1 = vector.transfer_read %arg0[%c0], %0 {in_bounds = [true]} : tensor<2xi4>, vector<2xi4>
  %2 = vector.extract %1[0] : i4 from vector<2xi4>
  %3 = vector.insert %2, %cst [0] : i4 into vector<2xi4>
  %4 = vector.extract %1[1] : i4 from vector<2xi4>
  %5 = vector.insert %4, %3 [1] : i4 into vector<2xi4>
  %6 = vector.transfer_write %5, %arg1[%c0] {in_bounds = [true]} : vector<2xi4>, tensor<2xi4>
  return %6 : tensor<2xi4>
}
```

This gets now canonicalized to:

```
func.func @wrapped_bitcast_convert(%arg0: tensor<2xi4>, %arg1: tensor<2xi4>) -> tensor<2xi4> { 
  %c0 = arith.constant 0 : index
  %0 = ub.poison : i4 
  %1 = vector.transfer_read %arg0[%c0], %0 {in_bounds = [true]} : tensor<2xi4>, vector<2xi4> 
  %2 = vector.transfer_write %1, %arg1[%c0] {in_bounds = [true]} : vector<2xi4>, tensor<2xi4> 
  return %2 : tensor<2xi4> 
} 
```

And then the other pattern applies and folds this into returning %arg0. I believe that pattern is wrong and should only be applied if the base argument is the same (so that we read and write to/from the same memory). Given this PR only triggers this potential bug, I guess I will have to look into fixing this myself.

Another issue I noticed is that apparently there are some GPU related tests that do not run by default which are still broken. These are the tests:

mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f16-f16-accum.mlir
mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f32.mlir

Most likely the GPU related pipeline that they use needs some adjustment to handle vector::FromElementOp with rank > 1.

https://github.com/llvm/llvm-project/pull/142944