[Mlir-commits] [mlir] [mlir][tensor] Add e2e test for tensor.unpack with dynamic tile sizes (PR #121557)

Wed Jan 8 09:24:18 PST 2025

================
@@ -0,0 +1,110 @@
+// DEFINE: %{compile} =  mlir-opt %s \
+// DEFINE:  -transform-interpreter -test-transform-dialect-erase-schedule |\
+// DEFINE: mlir-opt \
+// DEFINE:  -test-lower-to-llvm -o %t
+// DEFINE: %{entry_point} = main
+// DEFINE: %{run} = mlir-cpu-runner %t -e %{entry_point} -entry-point-result=void \
+// DEFINE:    -shared-libs=%mlir_runner_utils,%mlir_c_runner_utils
+
+// RUN: rm -f %t && %{compile} && %{run} | FileCheck %s
+
+/// End-to-end test for tensor.unpack where one of the inner tile sizes is
+/// dynamic. See pack-dynamic-inner-tile.mlir for a similar test for tensor.pack.
+
+func.func @main() {
+  // Allocate and initialise the inputs
+  %A_alloc = tensor.empty() : tensor<7x3xi32>
+
+  %A = arith.constant dense<[
+  [[[1],
+   [2],
+   [3],
+   [4],
+   [5],
+   [6],
+   [7],
+   [123]],
+  [[8],
+   [9],
+   [10],
+   [11],
+   [12],
+   [13],
+   [14],
+   [123]],
+  [[15],
+   [16],
+   [17],
+   [18],
+   [19],
+   [20],
+   [21],
+   [123]]]
+  ]> : tensor<1x3x8x1xi32>
+
+  %A_cast = tensor.cast %A : tensor<1x3x8x1xi32> to tensor<?x3x?x1xi32>
+  func.call @unpack(%A_cast) : (tensor<?x3x?x1xi32>) -> ()
+
+  return
+}
+
+func.func private @unpack(%A: tensor<?x3x?x1xi32>) {
+  %c1 = arith.constant 1 : index
+  %pad_val = arith.constant 123 : i32
+
+  // Dynamic tile size
+  %tile_size = arith.constant 8 : index
----------------
banach-space wrote:

Yes, I am a bit concerned about that - thanks for flagging it up!

Now, do we need to worry about this though? The test specifies it's own lowering pipeline (through TD) and canonicalization is used fairly late. So perhaps it will be fine?

Ultimately, my goal is to provide an e2e test that leverages vectorization. This discussion makes me think that only the "scalable vectorization" variants are truly future-proof:
* https://github.com/llvm/llvm-project/blob/main/mlir/test/Integration/Dialect/Linalg/CPU/ArmSVE/pack-scalable-inner-tile.mlir

As in, due to "scalability", those tests will just fail if "vectorization" is not used (due to e.g. some other patterns folding things away). The scalability is leveraged here:
https://github.com/llvm/llvm-project/blob/main/mlir/test/Integration/Dialect/Linalg/CPU/ArmSVE/pack-scalable-inner-tile.mlir#L54

Note that I am "forcing" the vector length to be 256 bits, which "auto-magically" makes the tile size "grow" from 8 to 16 (i.e. from default 128 bits to 256 bits). This is only possible when vectorization _is used_.

Tl;Dr Even if this particular test becomes obsolete, the "scalable" variant (that I am working towards), should remain relevant. 

(*) Unwanted from the point of view of this test. Folding constants away is obviously a good thing :)
(**) First, I need to be able to target `tensor.insert_slice` directly. That's not possible ATM, see this logic in [mlir::linalg::vectorize](https://github.com/llvm/llvm-project/blob/f99b1907570aa1ac3c8c0ff886563766bbdbc1c8/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp#L2209-L2251).

https://github.com/llvm/llvm-project/pull/121557