[PATCH] D155306: [mlir][ArmSME] Add tile load op and extend tile store tile size support
Cullen Rhodes via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sun Jul 23 04:19:39 PDT 2023
c-rhodes marked 4 inline comments as done.
c-rhodes added a comment.
@awarzynski @dcaballe thanks for reviewing, I believe I've addressed all comments now, will probably land this tomorrow unless there's further comments by then, cheers!
================
Comment at: mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td:230
+ let description = [{
+ Load a 2D SME "virtual tile" from memory.
+
----------------
dcaballe wrote:
> I would add some minor comment about memory constraints (the slice of memory read should be contiguous, etc.). You can get some inspiration from the `vector.transfer_read` op.
> I would add some minor comment about memory constraints (the slice of memory read should be contiguous, etc.). You can get some inspiration from the `vector.transfer_read` op.
Done cheers, the indices aren't used in the lowering at the moment it just assumes 0 to begin with then adjusts it for each tile slice. I've added a TODO to fix that.
================
Comment at: mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td:340
Arguments<(ins Arg<LDSTPredicate, "Vector predicate">,
- Arg<LLVM_AnyPointer, "Load address", [MemRead]>,
+ Arg<LLVM_AnyPointer, "Load address", [MemRead, MemWrite]>,
Arg<I32, "Virtual tile ID">,
----------------
dcaballe wrote:
> It may be worth asking in Discourse about this. There might be a better side effect or workaround for this.
> It may be worth asking in Discourse about this. There might be a better side effect or workaround for this.
Good suggestion will do
================
Comment at: mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp:82-92
+static void replaceLoadOrStoreOp(vector::LoadOp load,
+ PatternRewriter &rewriter) {
+ rewriter.replaceOpWithNewOp<arm_sme::TileLoadOp>(
+ load, load.getVectorType(), load.getBase(), load.getIndices());
+}
+
+static void replaceLoadOrStoreOp(vector::StoreOp store,
----------------
dcaballe wrote:
> We usually do this in place. No strong opinion but the number of these utilities may explode and it also create some kind of specific convention within this file.
> We usually do this in place. No strong opinion but the number of these utilities may explode and it also create some kind of specific convention within this file.
I copied this from `mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp`, but it's actually less lines and simpler in place, thanks for suggestion!
================
Comment at: mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp:116
+ vector::LoadOp>::VectorLoadStoreToArmSMELowering;
+};
+
----------------
awarzynski wrote:
> dcaballe wrote:
> > why do we need these specializations?
> I do see _why_, but I agree with Diego that it's not immediately clear.
> why do we need these specializations?
================
Comment at: mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp:165
+/// }
+/// %tile = arm_sme.cast_tile_to_vector %tile_id : i32 to vector<[4]x[4]xi32>
+/// ```
----------------
dcaballe wrote:
> we may want to materialize the loops in the SME dialect before the number of cases grow more
> we may want to materialize the loops in the SME dialect before the number of cases grow more
That's the plan, initially I was going to focus on vector.broadcast -> ArmSME after this to enable linalg.fill, but I've been looking at loop materialization first so the number of cases doesn't grow as you point out.
================
Comment at: mlir/test/Dialect/ArmSME/vector-ops-to-llvm.mlir:3
-// CHECK-LABEL: @transfer_write_2d_zero_i8
-// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xi8>)
+// -----
+
----------------
dcaballe wrote:
> remove init separator. Otherwise, this will compile an empty input for the upper part.
> remove init separator. Otherwise, this will compile an empty input for the upper part.
Did not know that! Fixed thanks
================
Comment at: mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir:77
+
+ // Verify "mem1" == "mem2"
+ %init_1 = arith.constant 1 : i64
----------------
awarzynski wrote:
> c-rhodes wrote:
> > awarzynski wrote:
> > > Having a "negative" would be good too (i.e. verify that `mem1 != mem2`).
> > > Having a "negative" would be good too (i.e. verify that `mem1 != mem2`).
> >
> > After zeroing mem2?
> That comment was for:
>
> > // Verify "mem1" == "mem2"
>
> I am suggesting a negative test, because current testing won't capture scenarios where something goes badly wrong. For example, when `mem1` and `mem2` end up pointing to the same memory location. Or, put differently, we can make sure that two mem buffers are identical. Can we make sure that they are different?
>
> This something to consider adding, not a blocker. The testing in this patch is already pretty through.
> That comment was for:
>
> > // Verify "mem1" == "mem2"
>
> I am suggesting a negative test, because current testing won't capture scenarios where something goes badly wrong. For example, when `mem1` and `mem2` end up pointing to the same memory location. Or, put differently, we can make sure that two mem buffers are identical. Can we make sure that they are different?
>
> This something to consider adding, not a blocker. The testing in this patch is already pretty through.
Done
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D155306/new/
https://reviews.llvm.org/D155306
More information about the llvm-commits
mailing list