[Mlir-commits] [mlir] 4e9eaa2 - [mlir][vector] Allow out-of-bounds starting positition for vector transfer ops

Wed Aug 2 06:37:13 PDT 2023

Author: Matthias Springer
Date: 2023-08-02T15:31:09+02:00
New Revision: 4e9eaa2e521dc4e0e5f01df9a9ea56204271519e

URL: https://github.com/llvm/llvm-project/commit/4e9eaa2e521dc4e0e5f01df9a9ea56204271519e
DIFF: https://github.com/llvm/llvm-project/commit/4e9eaa2e521dc4e0e5f01df9a9ea56204271519e.diff

LOG: [mlir][vector] Allow out-of-bounds starting positition for vector transfer ops

The starting indices of all vector dimensions are allowed to be out-of-bounds.

E.g.:
```
// %j is allowed to be out-of-bounds (but not %i).
%0 = vector.transfer_read %m[%i, %j] ... {in_bounds = [false]} : memref<?x?xf32>, vector<5xf32>
```

This revision just updates the op documentation and adds extra test cases. Out-of-bounds starting points are already supported by the respective lowerings:
* 2D and higher-dimensional transfers are lowered to 1D transfers by `VectorToScf`. These patterns generate an `scf.if` check for every (potentially unrolled) loop iteration if the dimension is `in_bounds = false`, including the first loop iteration.
- 1D out-of-bounds transfers are lowered to in-bounds transfers by `MaterializeTransferMask`, which adds a mask to the op. The mask is defined by `vector.create_mask (dim-size) - (index)`. In case of an out-of-bounds starting point, the operand of the `vector.create_mask` op is 0 or negative. Negative operands are treated like 0 according to the documentation of `vector.create_mask`.

Differential Revision: https://reviews.llvm.org/D155719

Added: 
    

Modified: 
    mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
    mlir/test/Integration/Dialect/Vector/CPU/test-transfer-read-1d.mlir
    mlir/test/Integration/Dialect/Vector/CPU/test-transfer-read-2d.mlir

Removed: 
    


################################################################################
diff  --git a/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td b/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
index 63d96721bfd400..357795cb262d4b 100644

--- a/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
+++ b/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
@@ -1201,14 +1201,15 @@ def Vector_TransferReadOp :
     `0` are masked out and replaced with `padding`.
 
     An optional boolean array attribute `in_bounds` specifies for every vector
-    dimension if the transfer is guaranteed to be within the source bounds.
-    While the starting point of the transfer has to be in-bounds, accesses may
-    run out-of-bounds as indices increase. Broadcast dimensions must always be
-    in-bounds. If specified, the `in_bounds` array length has to be equal to the
-    vector rank. In absence of the attribute, accesses along all dimensions
-    (except for broadcasts) may run out-of-bounds. A `vector.transfer_read` can
-    be lowered to a simple load if all dimensions are specified to be within
-    bounds and no `mask` was specified.
+    dimension if the transfer is guaranteed to be within the source bounds. If
+    specified, the `in_bounds` array length has to be equal to the vector rank.
+    If set to "false", accesses (including the starting point) may run
+    out-of-bounds along the respective vector dimension as the index increases.
+    Broadcast dimensions must always be in-bounds. In absence of the attribute,
+    accesses along all vector dimensions (except for broadcasts) may run
+    out-of-bounds. A `vector.transfer_read` can be lowered to a simple load if
+    all dimensions are specified to be within bounds and no `mask` was
+    specified. Note that non-vector dimensions *must* always be in-bounds.
 
     This operation is called 'read' by opposition to 'load' because the
     super-vector granularity is generally not representable with a single
@@ -1446,13 +1447,14 @@ def Vector_TransferWriteOp :
     is `0` are masked out.
 
     An optional boolean array attribute `in_bounds` specifies for every vector
-    dimension if the transfer is guaranteed to be within the source bounds.
-    While the starting point of the transfer has to be in-bounds, accesses may
-    run out-of-bounds as indices increase. If specified, the `in_bounds` array
-    length has to be equal to the vector rank. In absence of the attribute,
-    accesses along all dimensions may run out-of-bounds. A
-    `vector.transfer_write` can be lowered to a simple store if all dimensions
-    are specified to be within bounds and no `mask` was specified.
+    dimension if the transfer is guaranteed to be within the source bounds. If
+    specified, the `in_bounds` array length has to be equal to the vector rank.
+    If set to "false", accesses (including the starting point) may run
+    out-of-bounds along the respective vector dimension as the index increases.
+    In absence of the attribute, accesses along all vector dimensions may run
+    out-of-bounds. A `vector.transfer_write` can be lowered to a simple store if
+    all dimensions are specified to be within bounds and no `mask` was
+    specified. Note that non-vector dimensions *must* always be in-bounds.
 
     This operation is called 'write' by opposition to 'store' because the
     super-vector granularity is generally not representable with a single

diff  --git a/mlir/test/Integration/Dialect/Vector/CPU/test-transfer-read-1d.mlir b/mlir/test/Integration/Dialect/Vector/CPU/test-transfer-read-1d.mlir
index 5ff849b22069a6..8a98d39e657f2c 100644
--- a/mlir/test/Integration/Dialect/Vector/CPU/test-transfer-read-1d.mlir
+++ b/mlir/test/Integration/Dialect/Vector/CPU/test-transfer-read-1d.mlir
@@ -111,6 +111,17 @@ func.func @transfer_read_1d_mask(
   return
 }
 
+// Non-contiguous, out-of-bounds, strided load.
+func.func @transfer_read_1d_out_of_bounds(
+    %A : memref<?x?xf32>, %base1 : index, %base2 : index) {
+  %fm42 = arith.constant -42.0: f32
+  %f = vector.transfer_read %A[%base1, %base2], %fm42
+      {permutation_map = affine_map<(d0, d1) -> (d0)>, in_bounds = [false]}
+      : memref<?x?xf32>, vector<3xf32>
+  vector.print %f: vector<3xf32>
+  return
+}
+
 // Non-contiguous, strided load.
 func.func @transfer_read_1d_mask_in_bounds(
     %A : memref<?x?xf32>, %base1 : index, %base2 : index) {
@@ -149,6 +160,7 @@ func.func @entry() {
   %c1 = arith.constant 1: index
   %c2 = arith.constant 2: index
   %c3 = arith.constant 3: index
+  %c10 = arith.constant 10 : index
   %0 = memref.get_global @gv : memref<5x6xf32>
   %A = memref.cast %0 : memref<5x6xf32> to memref<?x?xf32>
 
@@ -169,6 +181,12 @@ func.func @entry() {
   call @transfer_read_1d_non_static_unit_stride(%A) : (memref<?x?xf32>) -> ()
   // CHECK: ( 31, 32, 33, 34 )
 
+  // 2.c. Read 1D vector from 2D memref with out-of-bounds transfer dim starting
+  //      point.
+  call @transfer_read_1d_out_of_bounds(%A, %c10, %c1)
+      : (memref<?x?xf32>, index, index) -> ()
+  // CHECK: ( -42, -42, -42 )
+
   // 3. Read 1D vector from 2D memref with non-unit stride on second dim.
   call @transfer_read_1d_non_unit_stride(%A) : (memref<?x?xf32>) -> ()
   // CHECK: ( 22, 24, -42 )

diff  --git a/mlir/test/Integration/Dialect/Vector/CPU/test-transfer-read-2d.mlir b/mlir/test/Integration/Dialect/Vector/CPU/test-transfer-read-2d.mlir
index e0b3f97583f445..cb8a8ce8ab0b0e 100644
--- a/mlir/test/Integration/Dialect/Vector/CPU/test-transfer-read-2d.mlir
+++ b/mlir/test/Integration/Dialect/Vector/CPU/test-transfer-read-2d.mlir
@@ -123,13 +123,24 @@ func.func @entry() {
   %c1 = arith.constant 1: index
   %c2 = arith.constant 2: index
   %c3 = arith.constant 3: index
+  %c10 = arith.constant 10 : index
   %0 = memref.get_global @gv : memref<3x4xf32>
   %A = memref.cast %0 : memref<3x4xf32> to memref<?x?xf32>
 
-  // 1. Read 2D vector from 2D memref.
+  // 1.a. Read 2D vector from 2D memref.
   call @transfer_read_2d(%A, %c1, %c2) : (memref<?x?xf32>, index, index) -> ()
   // CHECK: ( ( 12, 13, -42, -42, -42, -42, -42, -42, -42 ), ( 22, 23, -42, -42, -42, -42, -42, -42, -42 ), ( -42, -42, -42, -42, -42, -42, -42, -42, -42 ), ( -42, -42, -42, -42, -42, -42, -42, -42, -42 ) )
 
+  // 1.b. Read 2D vector from 2D memref. Starting position of first dim is
+  //      out-of-bounds.
+  call @transfer_read_2d(%A, %c3, %c2) : (memref<?x?xf32>, index, index) -> ()
+  // CHECK: ( ( -42, -42, -42, -42, -42, -42, -42, -42, -42 ), ( -42, -42, -42, -42, -42, -42, -42, -42, -42 ), ( -42, -42, -42, -42, -42, -42, -42, -42, -42 ), ( -42, -42, -42, -42, -42, -42, -42, -42, -42 ) )
+
+  // 1.c. Read 2D vector from 2D memref. Starting position of second dim is
+  //      out-of-bounds.
+  call @transfer_read_2d(%A, %c1, %c10) : (memref<?x?xf32>, index, index) -> ()
+  // CHECK: ( ( -42, -42, -42, -42, -42, -42, -42, -42, -42 ), ( -42, -42, -42, -42, -42, -42, -42, -42, -42 ), ( -42, -42, -42, -42, -42, -42, -42, -42, -42 ), ( -42, -42, -42, -42, -42, -42, -42, -42, -42 ) )
+
   // 2. Read 2D vector from 2D memref at specified location and transpose the
   //    result.
   call @transfer_read_2d_transposed(%A, %c1, %c2)