[PATCH] D155306: [mlir][ArmSME] Add tile load op and extend tile store tile size support

Thu Jul 20 10:15:09 PDT 2023

c-rhodes marked 8 inline comments as done.
c-rhodes added a comment.

In D155306#4517783 <https://reviews.llvm.org/D155306#4517783>, @awarzynski wrote:

> Thanks for the updates, I've left a few more nits/suggestions, but nothing major.

Thanks again, addressed almost all of them (just got a question on comment you left in integration test), also renamed a few variables in the rewrites to make things clearer. Cheers

================
Comment at: mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp:114-118
+/// Returns `offset` if memref is rank 2, otherwise adjusts `offset` by the
+/// number of elements in a vector of SVL bits.
+Value getOffset(MemRefType memRefType, Value offset, Value vscale,
+                Value minElems, Location loc,
+                ConversionPatternRewriter &rewriter) {
----------------
awarzynski wrote:
> Right, IIUC, this is something like:
> 
> ``` Scale the memory offset, i.e. `vnum`, if needed:
>  * for rank 2 memrefs, `getStridedElementPtr`does the calculation for us, so just return `vnum`. 
>  * for rank 1 memrefs, assume row-major storage and scale by the effective vector length.
> ```
> 
> Btw, this makes lots of sense, I just would like for us to be very clear about the meaning of `offset` and `vnum` in this context. The latter name. imho, includes a bit of helpful context, hence suggestion to rename. In the context `getStridedElementPtr`, `offset` probably makes more sense. `getOffset` also feels a bit too generic 🤔 .
> Right, IIUC, this is something like:
> 
> ``` Scale the memory offset, i.e. `vnum`, if needed:
>  * for rank 2 memrefs, `getStridedElementPtr`does the calculation for us, so just return `vnum`. 
>  * for rank 1 memrefs, assume row-major storage and scale by the effective vector length.
> ```
> 
> Btw, this makes lots of sense, I just would like for us to be very clear about the meaning of `offset` and `vnum` in this context. The latter name. imho, includes a bit of helpful context, hence suggestion to rename. In the context `getStridedElementPtr`, `offset` probably makes more sense. `getOffset` also feels a bit too generic 🤔 .

I've renamed it to `getTileSlicePtrIndex`, hopefully this clarifies things, also improved comment at top based on your suggestion.

================
Comment at: mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp:108-109
+
+/// Returns `offset` if memref is rank 2, otherwise adjusts `offset` by SVL<t>
+/// bytes.
+Value getOffset(MemRefType memRefType, Value offset, Value vscale,
----------------
awarzynski wrote:
> c-rhodes wrote:
> > awarzynski wrote:
> > > What "offset" is it? Why do we need to adjust it in the 1-D case and just return as is in 2-D?
> > > 
> > > In the 1-D case, is it:
> > > ```
> > > offet * vscale * minElems
> > > ```
> > > ? And is `minElems` the minimum number of elements in a scalable vector? So basically the "base size of an SVE vector"?
> > > What "offset" is it? Why do we need to adjust it in the 1-D case and just return as is in 2-D?
> > 
> > The offset to the load or store pointer. In the 2D case `getStridedElementPtr` does the arithmetic for us, but in the 1D case we have to do it ourselves.
> > 
> > > In the 1-D case, is it:
> > > ```
> > > offet * vscale * minElems
> > > ```
> > 
> > Yeah, and `offset` is `vnum` so it's `vnum * vscale * minElems`. So the offset is the number of elements for a given type in a vector of SVL bits (SVLt), and this is scaled by `vnum`. So the base would get incremented a tile vector at a time.
> > 
> > > ? And is `minElems` the minimum number of elements in a scalable vector? So basically the "base size of an SVE vector"?
> > 
> > Yeah, so `128 / esize`. Perhaps `getMinNumElts` would be better implemented like that rather than with a switch actually.
> > 
> > As for where supporting for both 1D and 2D memrefs came from, I initially started with these integration tests that dump ZA but could get it to work with 2D memrefs: https://gist.github.com/c-rhodes/1e9f2d8fd0ca3c6539f167e08079f6ab
> > 
> > I found those tests useful for verification but since the output varies depending on the runtime VL we cant add these tests.
> > 
> > I found those tests useful for verification but since the output varies depending on the runtime VL we cant add these tests.
> 
> Yeah, it would be nice to include them. Wouldn't it be possible to add `CHECK` lines that would assume minimum possible VL for each type? Not in this patch though - it's quite large as is.
> > I found those tests useful for verification but since the output varies depending on the runtime VL we cant add these tests.
> 
> Yeah, it would be nice to include them. Wouldn't it be possible to add `CHECK` lines that would assume minimum possible VL for each type? Not in this patch though - it's quite large as is.

Yeah it would actually, I've done that in the integration test already part of this based on your suggestion, but would could add the one I linked in future as well.

================
Comment at: mlir/test/Dialect/ArmSME/vector-ops-to-llvm.mlir:110-134
+// CHECK-LABEL: @vector_load_i16(
+// CHECK-SAME:                   %[[ARG0:.*]]: memref<?x?xi16>)
+// CHECK-NEXT: %[[MEM_DESC:.*]] = builtin.unrealized_conversion_cast %[[ARG0]] : memref<?x?xi16> to !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
+// CHECK-NEXT: %[[C0_0:.*]] = arith.constant 0 : index
+// CHECK-NEXT: %[[TILE_ID:.*]] = arm_sme.get_tile_id : i16
+// CHECK-NEXT: %[[C1:.*]] = arith.constant 1 : index
+// CHECK-NEXT: %[[MIN_ZA_VECTORS:.*]] = arith.constant 8 : index
----------------
awarzynski wrote:
> This and the following tests differ only in a few details that are tricky to spot. I am thinking that perhaps we should trim these to highlight the differences? That would be more in line with https://mlir.llvm.org/getting_started/TestingGuide/:
> >     Tests should be minimal, and only check what is absolutely necessary.
> >
> > This means that anything in the output that is not core to the functionality that you are testing should not be present in a CHECK line. 
> 
> The 2 tests above are sufficient to test the other nuances.
> This and the following tests differ only in a few details that are tricky to spot. I am thinking that perhaps we should trim these to highlight the differences? That would be more in line with https://mlir.llvm.org/getting_started/TestingGuide/:
> >     Tests should be minimal, and only check what is absolutely necessary.
> >
> > This means that anything in the output that is not core to the functionality that you are testing should not be present in a CHECK line. 
> 
> The 2 tests above are sufficient to test the other nuances.

Good suggestion they were a bit verbose, I've simplified them to highlight the important bits.

================
Comment at: mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir:77
+
+  // Verify "mem1" == "mem2"
+  %init_1 = arith.constant 1 : i64
----------------
awarzynski wrote:
> Having a "negative" would be good too (i.e. verify that `mem1 != mem2`).
> Having a "negative" would be good too (i.e. verify that `mem1 != mem2`).

After zeroing mem2?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155306/new/

https://reviews.llvm.org/D155306