[PATCH] D156689: [mlir][ArmSME] Use memref indices for load and store

Tue Aug 1 09:05:06 PDT 2023

awarzynski added a comment.

Thanks Cullen! A few minor comments/suggestions.

I think that it would be nice to update one of the tests (or more) as follows:

  // BEFORE
  func.func @arm_sme_tile_load(%src : memref<?x?xi32>) {
    %c0 = arith.constant 0 : index
    %tile = arm_sme.tile_load %src[%c0, %c0] : memref<?x?xi32>, vector<[4]x[4]xi32>
    return
  }

  // AFTER
  func.func @arm_sme_tile_load(%src : memref<?x?xi32>) {
    %c0 = arith.constant 0 : index
    %c123 = arith.constant 123 : index
    %tile = arm_sme.tile_load %src[%c123, %c0] : memref<?x?xi32>, vector<[4]x[4]xi32>
    return
  }

Otherwise it's a bit tricky to see what has changed - with `c0` in all tests the effective address does not change, does it?

================
Comment at: mlir/lib/Conversion/ArmSMEToSCF/ArmSMEToSCF.cpp:112
+    // tile.
+    SmallVector<Value> indices;
     auto tileSliceIndex = forOp.getInductionVar();
----------------
[nit] Suggesting a more descriptive name.

================
Comment at: mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp:135

-    auto minTileSlices = rewriter.create<arith::ConstantIndexOp>(
-        loc, arm_sme::getSMETileSliceMinNumElts(tileElementType));
-    auto vscale =
-        rewriter.create<vector::VectorScaleOp>(loc, rewriter.getIndexType());
-    // This describes both the number of ZA tile slices and the number of
-    // elements in a vector of SVL bits for a given element type (SVL_B, SVL_H,
-    // ..., SVL_Q).
-    auto numTileSlices =
-        rewriter.create<arith::MulIOp>(loc, minTileSlices, vscale);
-
     // Create 'arm_sme.intr.ld1*.horiz' intrinsic to load ZA tile slice.
+    Value ptr = this->getStridedElementPtr(
----------------
I think that this comment should be moved further down.

================
Comment at: mlir/test/Conversion/ArmSMEToSCF/arm-sme-to-scf.mlir:13
 // CHECK-NEXT:    scf.for %[[TILE_SLICE_INDEX:.*]] = %[[C0]] to %[[NUM_TILE_SLICES]] step %[[C1]] {
-// CHECK-NEXT:      arm_sme.load_tile_slice %[[SRC]]{{\[}}%[[TILE_SLICE_INDEX]]], %[[CAST_TILE_TO_VECTOR]], %[[TILE_SLICE_INDEX]] : memref<?x?xi32>, vector<[4]x[4]xi32>
+// CHECK-NEXT:      %[[TILE_SLICE_OFFSET:.*]] = arith.addi %[[C0]], %[[TILE_SLICE_INDEX]] : index
+// CHECK-NEXT:      arm_sme.load_tile_slice %[[SRC]]{{\[}}%[[TILE_SLICE_OFFSET]], %[[C0]]], %[[CAST_TILE_TO_VECTOR]], %[[TILE_SLICE_INDEX]] : memref<?x?xi32>, vector<[4]x[4]xi32>
----------------
[nit] "TILE_SLICE_OFFSET" suggests offset into `ZA` (that's where tile slices are defined). But this is for a plain memref. Perhaps `OFFSET`? I am also thinking that the key point of this patch is to clearly differentiate between memory offsets (low level LLVM concept) and "tile slice idx" (SME concept).

================
Comment at: mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir:12-21
+// RUN: mlir-opt %s -enable-arm-streaming="mode=locally enable-za" \
+// RUN:   -convert-vector-to-arm-sme -convert-arm-sme-to-scf \
+// RUN:   -convert-vector-to-llvm="enable-arm-sme" -cse -canonicalize \
+// RUN:   -allocate-arm-sme-tiles -test-lower-to-llvm | \
+// RUN: mlir-translate -mlir-to-llvmir | \
+// RUN: %lli_aarch64_cmd --march=aarch64 --mattr="+sve,+sme" \
+// RUN:   --entry-function=load_store_two_za_s_tiles \
----------------
Rather than duplicating this, could you use `DEFINE` and `REDEFINE`? Alternatively, you could move this to a separate file. There's quite a lot going on here.

================
Comment at: mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir:33

 func.func @za0_d_f64() -> i32 {
   %c0 = arith.constant 0 : index
----------------
Could you add a comment that would highlight the difference between `z0_d_f64` and `load_store_two_za_s_tiles`?

================
Comment at: mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir:246-247
+  // Allocate memory for two 32-bit element tiles.
+  %tilesize = arith.muli %svl_s, %svl_s : index
+  %size = arith.muli %tilesize, %c2_index : index
+  %mem1 = memref.alloca(%size) : memref<?xi32>
----------------
[nit] Small suggestion for a more descriptive name.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156689/new/

https://reviews.llvm.org/D156689