[Mlir-commits] [mlir] Full slices when tiling full loop trip count (PR #127197)

Mon Feb 17 05:35:56 PST 2025

================
@@ -214,3 +214,35 @@ module attributes {transform.with_named_sequence} {
     transform.yield
   }
 }
+
+// -----
+
+// CHECK-LABEL: func @non_monotonic_affine_expr
+//  CHECK-SAME:   %[[ARG0:[a-zA-Z0-9_]+]]: tensor<7xf32>
+func.func @non_monotonic_affine_expr(%arg0 : tensor<7xf32>) -> tensor<7xf32> {
+  %c0 = arith.constant 0 : index
+  %0 = tensor.dim %arg0, %c0 : tensor<7xf32>
+  %empty = tensor.empty() : tensor<7xf32>
+
+  // CHECK: %[[OUT:.*]] = tensor.empty() : tensor<7xf32>
+  // CHECK: scf.for {{.*}} to {{.*}} step {{.*}} iter_args(%[[TC0:.*]] = %[[OUT]]) -> (tensor<7xf32>) {
+  // CHECK: tensor.extract_slice %[[TC0]][0] [7] [1] : tensor<7xf32> to tensor<7xf32>
+  %generic = linalg.generic
+    {indexing_maps = [affine_map<(d0) -> (d0 mod 4)>,
+                      affine_map<(d0) -> (d0)>],
+     iterator_types = ["parallel"]}
+    ins(%arg0: tensor<7xf32>)
+    outs(%empty : tensor<7xf32>) {
+    ^bb0(%in : f32, %out: f32):
+      linalg.yield %in : f32
+    } -> tensor<7xf32>
+  return %generic : tensor<7xf32>
+}
----------------
josel-amd wrote:

The dynamic case would be similar to the static one written here but with a different `tile_size`. This in terms of correctness and just to say that both would generate invalid slices.

Let me make a quick detour before returning to the dynamic case...
 
If I understand this correctly (and I'm not sure I do), the goal of the `computSliceParameters` is to just do what the user asks it to and it is up to that user to obey the rules of `computeSliceParameters` and only provide valid tile sizes that won't result in invalid slices.

Here, we're using the tile size to disable slicing if a loop is not tiled. This is because when we tile & fuse through a whole chain of ops, the tiling algorithm uses the offset/sizes/strides of the consumer `extract_slice` to know how to tile the producer. So, in those cases there is no way to disable tiling (i.e. we cannot use `tile_sizes [0]`) while the algorithm does its thing.

Furthermore, in the presence of non-monotonic expressions jut trying to calculate the slice using a static tensor dim size results in invalid slices... For the example in this test case, `7 mod 4` would result in a slice of size 3 instead of 4. And that's why we stop slicing in those cases.

Now for the dynamic case, we would need to do all of this at runtime. By all, I mean check if the loop trip count matches the slice size and have some kind of specialization (?) or runtime tiling which is not implemented. This means that one should avoid tiling in the presence of dynamic shapes **and** non-monotonic expressions as there is no runtime mechanism to ensure the slices are generated correctly.

An alternative could be to disable slicing in the presence of non-monotonic expressions and dynamic shapes just to avoid the wrong code gen. I'm open to better alternatives..

https://github.com/llvm/llvm-project/pull/127197