[all-commits] [llvm/llvm-project] 6b2194: [mlir][vector] Fix invalid `LoadOp` indices being ...

Wed Jan 3 04:47:07 PST 2024

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 6b21948f26d69bad8c282db375906a8e0712d5f8
      https://github.com/llvm/llvm-project/commit/6b21948f26d69bad8c282db375906a8e0712d5f8
  Author: Rik Huijzer <github at huijzer.xyz>
  Date:   2024-01-03 (Wed, 03 Jan 2024)

  Changed paths:
    M mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp
    M mlir/test/Conversion/VectorToSCF/vector-to-scf.mlir

  Log Message:
  -----------
  [mlir][vector] Fix invalid `LoadOp` indices being created (#76292)

Fixes https://github.com/llvm/llvm-project/issues/71326.

This is the second PR. The first PR at
https://github.com/llvm/llvm-project/pull/75519 was reverted because an
integration test failed. The failed integration test was simplified and
added to the core MLIR tests. Compared to the first PR, the current PR
uses a more reliable approach. In summary, the current PR determines the
mask indices by looking up the _mask_ buffer load indices from the
previous iteration, whereas `main` looks up the indices for the _data_
buffer. The mask and data indices can differ when using a
`permutation_map`.

The cause of the issue was that a new `LoadOp` was created which looked
something like:
```mlir
func.func main(%arg1 : index, %arg2 : index) {
  %alloca_0 = memref.alloca() : memref<vector<1x32xi1>>
  %1 = vector.type_cast %alloca_0 : memref<vector<1x32xi1>> to memref<1xvector<32xi1>>
  %2 = memref.load %1[%arg1, %arg2] : memref<1xvector<32xi1>>
  return
}
```
which crashed inside the `LoadOp::verify`. Note here that `%alloca_0` is
the mask as can be seen from the `i1` element type and note it is 0
dimensional. Next, `%1` has one dimension, but `memref.load` tries to
index it with two indices.

This issue occured in the following code (a simplified version of the
bug report):
```mlir
#map1 = affine_map<(d0, d1, d2, d3) -> (d0, 0, 0, d3)>
func.func @main(%subview:  memref<1x1x1x1xi32>, %mask: vector<1x1xi1>) -> vector<1x1x1x1xi32> {
  %c0 = arith.constant 0 : index
  %c0_i32 = arith.constant 0 : i32
  %3 = vector.transfer_read %subview[%c0, %c0, %c0, %c0], %c0_i32, %mask {permutation_map = #map1}
          : memref<1x1x1x1xi32>, vector<1x1x1x1xi32>
  return %3 : vector<1x1x1x1xi32>
}
```
After this patch, it is lowered to the following by
`-convert-vector-to-scf`:
```mlir
func.func @main(%arg0: memref<1x1x1x1xi32>, %arg1: vector<1x1xi1>) -> vector<1x1x1x1xi32> {
  %c0_i32 = arith.constant 0 : i32
  %c0 = arith.constant 0 : index
  %c1 = arith.constant 1 : index
  %alloca = memref.alloca() : memref<vector<1x1x1x1xi32>>
  %alloca_0 = memref.alloca() : memref<vector<1x1xi1>>
  memref.store %arg1, %alloca_0[] : memref<vector<1x1xi1>>
  %0 = vector.type_cast %alloca : memref<vector<1x1x1x1xi32>> to memref<1xvector<1x1x1xi32>>
  %1 = vector.type_cast %alloca_0 : memref<vector<1x1xi1>> to memref<1xvector<1xi1>>
  scf.for %arg2 = %c0 to %c1 step %c1 {
    %3 = vector.type_cast %0 : memref<1xvector<1x1x1xi32>> to memref<1x1xvector<1x1xi32>>
    scf.for %arg3 = %c0 to %c1 step %c1 {
      %4 = vector.type_cast %3 : memref<1x1xvector<1x1xi32>> to memref<1x1x1xvector<1xi32>>
      scf.for %arg4 = %c0 to %c1 step %c1 {
        %5 = memref.load %1[%arg2] : memref<1xvector<1xi1>>
        %6 = vector.transfer_read %arg0[%arg2, %c0, %c0, %c0], %c0_i32, %5 {in_bounds = [true]} : memref<1x1x1x1xi32>, vector<1xi32>
        memref.store %6, %4[%arg2, %arg3, %arg4] : memref<1x1x1xvector<1xi32>>
      }
    }
  }
  %2 = memref.load %alloca[] : memref<vector<1x1x1x1xi32>>
  return %2 : vector<1x1x1x1xi32>
}
```
What was causing the problems is that one dimension of the data buffer
`%alloca` (eltype `i32`) is unpacked (`vector.type_cast`) inside the
outmost loop (loop with index variable `%arg2`) and the nested loop
(loop with index variable `%arg3`), whereas the mask buffer `%alloca_0`
(eltype `i1`) is not unpacked in these loops.

Before this patch, the load indices would be determined by looking up
the load indices for the *data* buffer load op. However, as shown in the
specific example, when a permutation map is specified then the load
indices from the data buffer load op start to differ from the indices
for the mask op. To fix this, this patch ensures that the load indices
for the *mask* buffer are used instead.

---------

Co-authored-by: Mehdi Amini <joker.eph at gmail.com>