[Mlir-commits] [mlir] [MLIR] Legalize certain `vector.transfer_read` ops of scalable vectors (PR #143146)

Wed Jun 25 03:37:02 PDT 2025

================
@@ -0,0 +1,257 @@
+// RUN: mlir-opt --arm-sve-legalize-vector-storage --split-input-file %s | FileCheck %s
+
+
+// Test the `LegalizeTransferRead` pattern
+// (mlir/lib/Dialect/ArmSVE/Transforms/LegalizeVectorStorage.cpp)
+
+// -----
+
+// This is the base case, unremarkable in any way, except that it's our main
+// motivating example and use case.
+
+// CHECK-LABEL:       @base_case
+// CHECK-SAME:          %[[I:.+]]: index, %[[J:.+]]: index, %[[M:.+]]:
+// CHECK:               %[[PAD:.+]] = arith.constant 0 : i8
+// CHECK:               %[[C0:.+]] = arith.constant 0 : index
+// CHECK:               %[[COLLAPSE:.+]] = memref.collapse_shape %[[M]]
+// CHECK-SAME{LITERAL}:   [[0], [1], [2, 3]]
+// CHECK-SAME:            : memref<?x?x?x8xi8> into memref<?x?x?xi8>
+// CHECK-NEXT:          %[[T0:.+]] = vector.transfer_read %[[COLLAPSE]][%[[I]], %[[J]], %[[C0]]], %[[PAD]] {in_bounds = [true]}
+// CHECK-SAME:            : memref<?x?x?xi8>, vector<[32]xi8>
+// CHECK-NEXT:          %[[T1:.+]] = vector.shape_cast %[[T0]] : vector<[32]xi8> to vector<[4]x8xi8>
----------------
banach-space wrote:

OK, so this is specifically to target Arm's i8mm (i.e. dot product instructions) for SVE. Here are two links:
* [SMMLA for NEON](https://developer.arm.com/documentation/ddi0602/2025-03/SIMD-FP-Instructions/SMMLA--vector---Signed-8-bit-integer-matrix-multiply-accumulate--vector--)
* [SMMLA for SVE](https://developer.arm.com/documentation/ddi0602/2025-03/SVE-Instructions/SMMLA--Signed-8-bit-integer-matrix-multiply-accumulate-to-32-bit-integer-).

You may recall that that extension requires the input data to be in a specific format. Nothing particularly fancy and, e.g. packing into tiles of `vector<2x8xi8>` for NEON would do (`vector<4x8xi8>` would be unrolled to `vector<2x8xi8>`). At hardware level though, we would load `vector<16xi8>` rather than a 2D vectors (there are no 2D load instructions).

For SVE, we simply make the `N` dimension "scalable, which gives us `vector<[2]x8xi8>`.  Again, since we cannot load 2D vectors, we "interpret" that as `vector<[16]xi8>`

In these cases, `vector.shape_cast` as no-op and helps us to get from higher-level abstraction to hardware-level representation.

> Is there something I can read to refresh my mental model here?

There's quite a lot of fine details here and I want to avoid going on too much of a tangent, so just ask for more clarification if this is still unclear. Personally, I like this white paper a lot:
* [Arm Scalable Vector Extension and application to Machine Learning](https://developer.arm.com/-/media/developer/products/software-tools/hpc/White%20papers/arm-scalable-vector-extensions-and-application-to-machine-learning.pdf?revision=510ee340-fce1-4fd8-bad6-bade674620a5)

Thanks!

https://github.com/llvm/llvm-project/pull/143146