[PATCH] D156689: [mlir][ArmSME] Use memref indices for load and store

Wed Aug 2 04:58:21 PDT 2023

c-rhodes marked 4 inline comments as done.
c-rhodes added a comment.

In D156689#4550748 <https://reviews.llvm.org/D156689#4550748>, @awarzynski wrote:

> Thanks Cullen! A few minor comments/suggestions.

Thanks for comments

> I think that it would be nice to update one of the tests (or more) as follows:
>
>   // BEFORE
>   func.func @arm_sme_tile_load(%src : memref<?x?xi32>) {
>     %c0 = arith.constant 0 : index
>     %tile = arm_sme.tile_load %src[%c0, %c0] : memref<?x?xi32>, vector<[4]x[4]xi32>
>     return
>   }
>   
>   // AFTER
>   func.func @arm_sme_tile_load(%src : memref<?x?xi32>) {
>     %c0 = arith.constant 0 : index
>     %c123 = arith.constant 123 : index
>     %tile = arm_sme.tile_load %src[%c123, %c0] : memref<?x?xi32>, vector<[4]x[4]xi32>
>     return
>   }
>
> Otherwise it's a bit tricky to see what has changed - with `c0` in all tests the effective address does not change, does it?

this isn't changing anything in the example you provided, the offset is adjusted when materializing the tile slice loop, and that is tested in the -convert-arm-sme-to-scf tests.

================
Comment at: mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp:135

-    auto minTileSlices = rewriter.create<arith::ConstantIndexOp>(
-        loc, arm_sme::getSMETileSliceMinNumElts(tileElementType));
-    auto vscale =
-        rewriter.create<vector::VectorScaleOp>(loc, rewriter.getIndexType());
-    // This describes both the number of ZA tile slices and the number of
-    // elements in a vector of SVL bits for a given element type (SVL_B, SVL_H,
-    // ..., SVL_Q).
-    auto numTileSlices =
-        rewriter.create<arith::MulIOp>(loc, minTileSlices, vscale);
-
     // Create 'arm_sme.intr.ld1*.horiz' intrinsic to load ZA tile slice.
+    Value ptr = this->getStridedElementPtr(
----------------
awarzynski wrote:
> I think that this comment should be moved further down.
> I think that this comment should be moved further down.

that's not part of this patch but done anyway

================
Comment at: mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir:12-21
+// RUN: mlir-opt %s -enable-arm-streaming="mode=locally enable-za" \
+// RUN:   -convert-vector-to-arm-sme -convert-arm-sme-to-scf \
+// RUN:   -convert-vector-to-llvm="enable-arm-sme" -cse -canonicalize \
+// RUN:   -allocate-arm-sme-tiles -test-lower-to-llvm | \
+// RUN: mlir-translate -mlir-to-llvmir | \
+// RUN: %lli_aarch64_cmd --march=aarch64 --mattr="+sve,+sme" \
+// RUN:   --entry-function=load_store_two_za_s_tiles \
----------------
awarzynski wrote:
> Rather than duplicating this, could you use `DEFINE` and `REDEFINE`? Alternatively, you could move this to a separate file. There's quite a lot going on here.
> Rather than duplicating this, could you use `DEFINE` and `REDEFINE`? Alternatively, you could move this to a separate file. There's quite a lot going on here.

I was thinking this should have a single entry that calls both tests but ZA isn't currently re-enabled after calls so that's not possible until that's resolved but should be once it is.

================
Comment at: mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir:246-247
+  // Allocate memory for two 32-bit element tiles.
+  %tilesize = arith.muli %svl_s, %svl_s : index
+  %size = arith.muli %tilesize, %c2_index : index
+  %mem1 = memref.alloca(%size) : memref<?xi32>
----------------
awarzynski wrote:
> [nit] Small suggestion for a more descriptive name.
> [nit] Small suggestion for a more descriptive name.

I understanding you intend that to mean to element type of the tile but appending type usually indicates the value is of that type which this isnt, it's an index, I've kept it as is.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156689/new/

https://reviews.llvm.org/D156689