[PATCH] D155306: [mlir][ArmSME] Add tile load op and extend tile store tile size support

Thu Jul 20 11:02:23 PDT 2023

dcaballe accepted this revision.
dcaballe added a comment.
This revision is now accepted and ready to land.

LGTM % minor comments, thanks!

================
Comment at: mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td:230
+  let description = [{
+    Load a 2D SME "virtual tile" from memory.
+
----------------
I would add some minor comment about memory constraints (the slice of memory read should be contiguous, etc.). You can get some inspiration from the `vector.transfer_read` op.

================
Comment at: mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td:340
       Arguments<(ins Arg<LDSTPredicate, "Vector predicate">,
-                 Arg<LLVM_AnyPointer, "Load address", [MemRead]>,
+                 Arg<LLVM_AnyPointer, "Load address", [MemRead, MemWrite]>,
                  Arg<I32, "Virtual tile ID">,
----------------
It may be worth asking in Discourse about this. There might be a better side effect or workaround for this.

================
Comment at: mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp:82-92
+static void replaceLoadOrStoreOp(vector::LoadOp load,
+                                 PatternRewriter &rewriter) {
+  rewriter.replaceOpWithNewOp<arm_sme::TileLoadOp>(
+      load, load.getVectorType(), load.getBase(), load.getIndices());
+}
+
+static void replaceLoadOrStoreOp(vector::StoreOp store,
----------------
We usually do this in place. No strong opinion but the number of these utilities may explode and it also create some kind of specific convention within this file.

================
Comment at: mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp:116
+      vector::LoadOp>::VectorLoadStoreToArmSMELowering;
+};
+
----------------
why do we need these specializations?

================
Comment at: mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp:165
+///  }
+///  %tile = arm_sme.cast_tile_to_vector %tile_id : i32 to vector<[4]x[4]xi32>
+///  ```
----------------
we may want to materialize the loops in the SME dialect before the number of cases grow more

================
Comment at: mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp:333
+
+    auto tileI32 = castTileIDToI32(tile, loc, rewriter);
+    switch (tileElementWidth) {
----------------
spell out auto?

================
Comment at: mlir/lib/Dialect/ArmSME/Utils/CMakeLists.txt:1
+add_mlir_dialect_library(MLIRArmSMEUtils
+  Utils.cpp
----------------
c-rhodes wrote:
> awarzynski wrote:
> > Do we need a dedicated library for one CPP file? Perhaps it's sufficient to add this to `MLIRArmSMETransforms`?
> > Do we need a dedicated library for one CPP file? Perhaps it's sufficient to add this to `MLIRArmSMETransforms`?
> 
> I copied this from another dialect and it seems all dialects with utils do this.
Yeah, I've seen some complaints about the utils library in the past. I was even recommended to remove the utils files altogether. I'm ok with this, though, if other dialects are doing the same...

================
Comment at: mlir/test/Dialect/ArmSME/vector-ops-to-llvm.mlir:3

-// CHECK-LABEL: @transfer_write_2d_zero_i8
-// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xi8>)
+// -----
+
----------------
remove init separator. Otherwise, this will compile an empty input for the upper part.

================
Comment at: mlir/test/Dialect/ArmSME/vector-ops-to-llvm.mlir:356
+// CHECK: %[[SVL_D:.*]] = arith.muli %[[MIN_SVL_D]], %{{.*}} : index
+// CHECK:   %[[TILE_ID_I32:.*]] = arith.trunci %[[CAST_VECTOR_TO_TILE]] : i64 to i32
+// CHECK:   arm_sme.intr.st1d.horiz
----------------
Nice testing!

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155306/new/

https://reviews.llvm.org/D155306