[PATCH] D155306: [mlir][ArmSME] Add tile load op and extend tile store tile size support

Sun Jul 23 04:19:39 PDT 2023

c-rhodes marked 4 inline comments as done.
c-rhodes added a comment.

@awarzynski @dcaballe thanks for reviewing, I believe I've addressed all comments now, will probably land this tomorrow unless there's further comments by then, cheers!

================
Comment at: mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td:230
+  let description = [{
+    Load a 2D SME "virtual tile" from memory.
+
----------------
dcaballe wrote:
> I would add some minor comment about memory constraints (the slice of memory read should be contiguous, etc.). You can get some inspiration from the `vector.transfer_read` op.
> I would add some minor comment about memory constraints (the slice of memory read should be contiguous, etc.). You can get some inspiration from the `vector.transfer_read` op.

Done cheers, the indices aren't used in the lowering at the moment it just assumes 0 to begin with then adjusts it for each tile slice. I've added a TODO to fix that.

================
Comment at: mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td:340
       Arguments<(ins Arg<LDSTPredicate, "Vector predicate">,
-                 Arg<LLVM_AnyPointer, "Load address", [MemRead]>,
+                 Arg<LLVM_AnyPointer, "Load address", [MemRead, MemWrite]>,
                  Arg<I32, "Virtual tile ID">,
----------------
dcaballe wrote:
> It may be worth asking in Discourse about this. There might be a better side effect or workaround for this.
> It may be worth asking in Discourse about this. There might be a better side effect or workaround for this.

Good suggestion will do

================
Comment at: mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp:82-92
+static void replaceLoadOrStoreOp(vector::LoadOp load,
+                                 PatternRewriter &rewriter) {
+  rewriter.replaceOpWithNewOp<arm_sme::TileLoadOp>(
+      load, load.getVectorType(), load.getBase(), load.getIndices());
+}
+
+static void replaceLoadOrStoreOp(vector::StoreOp store,
----------------
dcaballe wrote:
> We usually do this in place. No strong opinion but the number of these utilities may explode and it also create some kind of specific convention within this file.
> We usually do this in place. No strong opinion but the number of these utilities may explode and it also create some kind of specific convention within this file.

I copied this from `mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp`, but it's actually less lines and simpler in place, thanks for suggestion!

================
Comment at: mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp:116
+      vector::LoadOp>::VectorLoadStoreToArmSMELowering;
+};
+
----------------
awarzynski wrote:
> dcaballe wrote:
> > why do we need these specializations?
> I do see _why_, but I agree with Diego that it's not immediately clear. 
> why do we need these specializations?

================
Comment at: mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp:165
+///  }
+///  %tile = arm_sme.cast_tile_to_vector %tile_id : i32 to vector<[4]x[4]xi32>
+///  ```
----------------
dcaballe wrote:
> we may want to materialize the loops in the SME dialect before the number of cases grow more
> we may want to materialize the loops in the SME dialect before the number of cases grow more

That's the plan, initially I was going to focus on vector.broadcast -> ArmSME after this to enable linalg.fill, but I've been looking at loop materialization first so the number of cases doesn't grow as you point out.

================
Comment at: mlir/test/Dialect/ArmSME/vector-ops-to-llvm.mlir:3

-// CHECK-LABEL: @transfer_write_2d_zero_i8
-// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xi8>)
+// -----
+
----------------
dcaballe wrote:
> remove init separator. Otherwise, this will compile an empty input for the upper part.
> remove init separator. Otherwise, this will compile an empty input for the upper part.

Did not know that! Fixed thanks

================
Comment at: mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir:77
+
+  // Verify "mem1" == "mem2"
+  %init_1 = arith.constant 1 : i64
----------------
awarzynski wrote:
> c-rhodes wrote:
> > awarzynski wrote:
> > > Having a "negative" would be good too (i.e. verify that `mem1 != mem2`).
> > > Having a "negative" would be good too (i.e. verify that `mem1 != mem2`).
> > 
> > After zeroing mem2?
> That comment was for:
> 
> >   // Verify "mem1" == "mem2"
> 
> I am suggesting a negative test, because current testing won't capture scenarios where something goes badly wrong. For example, when `mem1` and `mem2` end up pointing to the same memory location. Or, put differently, we can make sure that two mem buffers are identical. Can we make sure that they are different?
> 
> This something to consider adding, not a blocker. The testing in this patch is already pretty through.
> That comment was for:
> 
> >   // Verify "mem1" == "mem2"
> 
> I am suggesting a negative test, because current testing won't capture scenarios where something goes badly wrong. For example, when `mem1` and `mem2` end up pointing to the same memory location. Or, put differently, we can make sure that two mem buffers are identical. Can we make sure that they are different?
> 
> This something to consider adding, not a blocker. The testing in this patch is already pretty through.

Done

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155306/new/

https://reviews.llvm.org/D155306