[Mlir-commits] [mlir] [MLIR][Vector] Add Lowering for vector.step (PR #113655)

Thu Oct 31 04:05:56 PDT 2024

================
@@ -167,10 +167,23 @@ func.func @vectorize_linalg_index(%arg0: tensor<3x3x?xf32>, %arg1: tensor<1x1x?x
 // CHECK-DAG:          %[[C2:.*]] = arith.constant 2 : index
 // CHECK:        %[[DST_DIM2:.*]] = tensor.dim %[[DST]], %[[C2]] : tensor<1x1x?xf32>
 // CHECK:        %[[MASK:.*]] = vector.create_mask %[[C1]], %[[C1]], %[[DST_DIM2]] : vector<1x1x[4]xi1>
-// CHECK:       %[[INDEX_VEC:.*]] = vector.step : vector<[4]xindex>
-// CHECK:            %[[READ:.*]] = vector.mask %[[MASK]] { vector.transfer_read %[[SRC]][%c0, %c0, %2], %cst {in_bounds = [true, true, true]} : tensor<3x3x?xf32>, vector<1x1x[4]xf32> } : vector<1x1x[4]xi1> -> vector<1x1x[4]xf32>
-// CHECK:             %[[OUT:.*]] = vector.mask %[[MASK]] { vector.transfer_write %[[READ]], %[[DST]]{{\[}}%[[C0]], %[[C0]], %[[C0]]] {in_bounds = [true, true, true]} : vector<1x1x[4]xf32>, tensor<1x1x?xf32> } : vector<1x1x[4]xi1> -> tensor<1x1x?xf32>
-// CHECK:           return %[[OUT]] : tensor<1x1x?xf32>
+
+// CHECK-DAG: %[[STEP1:.+]]  = vector.step : vector<1xindex>
+// CHECK-DAG: %[[STEP1B:.+]] = vector.broadcast %[[STEP1]] : vector<1xindex> to vector<1x1x[4]xindex>
+// CHECK-DAG: %[[STEP1B_CAST:.+]] = vector.shape_cast %[[STEP1B]] : vector<1x1x[4]xindex> to vector<[4]xindex>
+// CHECK-DAG: %[[STEP1B_ELEMENT:.+]] = vector.extractelement %[[STEP1B_CAST]][%c0_i32 : i32] : vector<[4]xindex>
+
+// CHECK-DAG: %[[STEP2:.+]] = vector.step : vector<1xindex>
+// CHECK-DAG: %[[STEP2B:.+]] = vector.broadcast %[[STEP2]] : vector<1xindex> to vector<1x1x[4]xindex>
+// CHECK-DAG: %[[STEP2B_CAST:.+]] = vector.shape_cast %[[STEP2B]] : vector<1x1x[4]xindex> to vector<[4]xindex>
+// CHECK-DAG: %[[STEP2B_ELEMENT:.+]] = vector.extractelement %[[STEP2B_CAST]][%c0_i32 : i32] : vector<[4]xindex>
+
+// CHECK-DAG: %[[STEP_SCALABLE:.+]] = vector.step : vector<[4]xindex>
+// CHECK-DAG: %[[STEP_SCALABLE_ELEMENT:.+]] = vector.extractelement %[[STEP_SCALABLE]][%c0_i32 : i32] : vector<[4]xindex>
+
+// CHECK: %[[READ:.*]] = vector.mask %[[MASK]] { vector.transfer_read %[[SRC]][%[[STEP1B_ELEMENT]], %[[STEP2B_ELEMENT]], %[[STEP_SCALABLE_ELEMENT]]], %cst {in_bounds = [true, true, true]} : tensor<3x3x?xf32>, vector<1x1x[4]xf32> } : vector<1x1x[4]xi1> -> vector<1x1x[4]xf32>
+// CHECK: %[[OUT:.*]] = vector.mask %[[MASK]] { vector.transfer_write %[[READ]], %[[DST]]{{\[}}%[[C0]], %[[C0]], %[[C0]]] {in_bounds = [true, true, true]} : vector<1x1x[4]xf32>, tensor<1x1x?xf32> } : vector<1x1x[4]xi1> -> tensor<1x1x?xf32>
+// CHECK: return %[[OUT]] : tensor<1x1x?xf32>
----------------
banach-space wrote:

This is a fairly substantial change in the expected output and the reason behind that should be documented.

>From what I can tell, this is due to the lowering of `vector.step` being removed from the canonicalizer. So even though the canonicalization is being run for this test, it no longer matters that match.

**Ask 1:** Please add a note in the summary/commit msg and remove the TD logic that applies canonicalizer (I expected that the output won't change).

In fact, we should remove canonicalization from this test to make this more consistent with other tests in this file (that rely on `transform.structured.vectorize` alone).

**Ask 2:** Remove canonicalization from this test.

Once you remove canonicalization, the output will become a bit uglier. My suggestion is to simplify this test. It's meant to exercise scalable vectorization, so it's fine to skip fixed-width parts. Here's a suggested simplified version
```mlir
#map = affine_map<(d0) -> (d0)>
func.func @vectorize_linalg_index(%src: tensor<?xf32>, %dest: tensor<?xf32>) -> tensor<?xf32> {
  %res = linalg.generic {
    indexing_maps = [#map],
    iterator_types = ["parallel"]
  } outs(%dest : tensor<?xf32>) {
  ^bb0(%in: f32):
    %idx = linalg.index 0 : index
    %out = tensor.extract %src[%idx] : tensor<?xf32>
    linalg.yield %out : f32
  } -> tensor<?xf32>
  return %res : tensor<?xf32>
}

module attributes {transform.with_named_sequence} {
  transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
    %0 = transform.structured.match ops{["linalg.generic"]} in %arg1 : (!transform.any_op) -> !transform.any_op
    transform.structured.vectorize %0 vector_sizes [[4]] {vectorize_nd_extract} : !transform.any_op
    transform.yield
  }
}
```

**Ask 3:** Update the test.

https://github.com/llvm/llvm-project/pull/113655