[Mlir-commits] [mlir] [mlir][vector] Add support for multi-dim reduction vector distribution (PR #71193)

Sat Nov 4 11:11:44 PDT 2023

================
@@ -494,6 +494,62 @@ func.func @warp_scf_for_multiple_yield(%arg0: index, %arg1: memref<?xf32>, %arg2
 
 // -----
 
+// CHECK-PROP-LABEL:   func @warp_scf_for_multi_reduce(
+//   CHECK-PROP-NOT:   vector.warp_execute_on_lane_0
+//       CHECK-PROP:   scf.for {{.*}} -> (vector<1x4xf32>) {        
+//       CHECK-PROP:     scf.for {{.*}} -> (vector<1x4xf32>) {
+//       CHECK-PROP:       vector.transfer_read {{.*}} : memref<2x32x40x384xf32>, vector<1x4xf32> 
+//       CHECK-PROP:     }
+//       CHECK-PROP:   }
+//       CHECK-PROP:   vector.reduction <add>
+//       CHECK-PROP:   gpu.shuffle
+#map = affine_map<(d0, d1) -> (0, 0)>
+func.func @warp_scf_for_multi_reduce(%arg0: memref<2x32x40x384xf32>, %arg1: memref<2x32x40x384xf16>, %arg2: memref<2x32xf32>, %arg3: memref<2x32x40x384xf16>) {
+  %cst = arith.constant dense<1.536000e+04> : vector<8x128xf32>
+  %cst_0 = arith.constant dense<0.000000e+00> : vector<8x128xf32>
+  %cst_1 = arith.constant 9.99999997E-7 : f32
+  %c128 = arith.constant 128 : index
+  %c8 = arith.constant 8 : index
+  %c0 = arith.constant 0 : index
+  %c40 = arith.constant 40 : index
+  %c384 = arith.constant 384 : index
+  %cst_2 = arith.constant 0.000000e+00 : f16
+  %cst_3 = arith.constant 0.000000e+00 : f32
+  %0 = gpu.thread_id  x
+  %1 = arith.truncf %cst_1 : f32 to f16
+  vector.warp_execute_on_lane_0(%0)[256] {
----------------
antiagainst wrote:

This example serves as an integrated test (in the sense we test multiple reductions + scf.for moving out of the warp op). The shape here is a nice match. Can we also add another small test where we only check vector reduction but with a shape that is more complicated? Like maybe warp size = 256 and vector<128x4x64> -> vector<32x1x16> or something. Basically to check that the affine map for controlling distribution order better.

https://github.com/llvm/llvm-project/pull/71193