[PATCH] D156689: [mlir][ArmSME] Use memref indices for load and store

Andrzej Warzynski via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Aug 3 00:36:12 PDT 2023


awarzynski accepted this revision.
awarzynski added a comment.

LGTM, thanks! I've left a few more comments, but feel free to ignore.

> this isn't changing anything in the example you provided, the offset is adjusted when materializing the tile slice loop, and that is tested in the -convert-arm-sme-to-scf tests.

Clarified what I had in mind inline in a test. Not a blocker.



================
Comment at: mlir/test/Dialect/ArmSME/vector-ops-to-llvm.mlir:62
 func.func @vector_load_i8(%arg0 : memref<?x?xi8>) -> vector<[16]x[16]xi8> {
   %c0 = arith.constant 0 : index
   %tile = vector.load %arg0[%c0, %c0] : memref<?x?xi8>, vector<[16]x[16]xi8>
----------------
I was suggesting to change `%c0 = 0` to e.g. `%c123 = 123` in places like this one.  Right now (with `%c0`) it looks like this
```
// BEFORE
// CHECK-NEXT:   %[[OFF0:.*]] = llvm.mul %[[TILE_SLICE_I64]], %[[STRIDE0]]  : i64
// CHECK-NEXT:   %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[OFF0]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
```
becomes:
```
// AFTER
// CHECK-NEXT:   %[[OFF0:.*]] = llvm.mul %[[TILE_SLICE_I64]], %[[STRIDE0]]  : i64
// CHECK-NEXT:   %[[OFF1:.*]] = llvm.add %[[OFF0]], %[[C0_I64]]  : i64
// CHECK-NEXT:   %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[OFF1]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
```

Note that `OFF0` and `OFF1` are identical so the benefits of this patch are obfuscated. The resulting code is correct _with_ and _without_ your change (because the offset is `0`).

Replacing `%c0` with `%c123` makes things more interesting: 
```
// BEFORE
// CHECK-NEXT:   %[[OFF0:.*]] = llvm.mul %[[TILE_SLICE_I64]], %[[STRIDE0]]  : i64
// CHECK-NEXT:   %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[OFF0]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
```
becomes:
```
// AFTER
// CHECK-NEXT:   %[[OFF0:.*]] = llvm.mul %[[TILE_SLICE_I64]], %[[STRIDE0]]  : i64
// CHECK-NEXT:   %[[OFF1:.*]] = llvm.add %[[OFF0]], %[[C123_I64]]  : i64
// CHECK-NEXT:   %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[OFF1]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
```
In this case the "BEFORE" is obviously broken.


================
Comment at: mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir:12-21
+// RUN: mlir-opt %s -enable-arm-streaming="mode=locally enable-za" \
+// RUN:   -convert-vector-to-arm-sme -convert-arm-sme-to-scf \
+// RUN:   -convert-vector-to-llvm="enable-arm-sme" -cse -canonicalize \
+// RUN:   -allocate-arm-sme-tiles -test-lower-to-llvm | \
+// RUN: mlir-translate -mlir-to-llvmir | \
+// RUN: %lli_aarch64_cmd --march=aarch64 --mattr="+sve,+sme" \
+// RUN:   --entry-function=load_store_two_za_s_tiles \
----------------
c-rhodes wrote:
> awarzynski wrote:
> > Rather than duplicating this, could you use `DEFINE` and `REDEFINE`? Alternatively, you could move this to a separate file. There's quite a lot going on here.
> > Rather than duplicating this, could you use `DEFINE` and `REDEFINE`? Alternatively, you could move this to a separate file. There's quite a lot going on here.
> 
> I was thinking this should have a single entry that calls both tests but ZA isn't currently re-enabled after calls so that's not possible until that's resolved but should be once it is.
> I was thinking this should have a single entry that calls both tests but ZA isn't currently re-enabled after calls so that's not possible until that's resolved but should be once it is.

I don't follow - with `DEFINE`/`REDEFINE`  you can re-use everything while keeping the entry point custom:

```
// DEFINE: %{entry_point} = za0_d_f64

// DEFINE: %{compile} = mlir-opt %s -enable-arm-streaming="mode=locally enable-za" \
// DEFINE:   -convert-vector-to-arm-sme -convert-arm-sme-to-scf \
// DEFINE:   -convert-vector-to-llvm="enable-arm-sme" -cse -canonicalize \
// DEFINE:   -allocate-arm-sme-tiles -test-lower-to-llvm
// DEFINE: %{translate} = mlir-translate -mlir-to-llvmir | \
// DEFINE: %{run} = %lli_aarch64_cmd --march=aarch64 --mattr="+sve,+sme" \
// DEFINE:   --entry-function=%{entry_point} \
// DEFINE:   --dlopen=%mlir_native_utils_lib_dir/libmlir_c_runner_utils%shlibext \
// DEFINE:   --dlopen=%mlir_native_utils_lib_dir/libmlir_runner_utils%shlibext | \
// RUN: %{compile} | %{translate} | %{run} | FileCheck %s --check-prefix=CHECK-ZA0_D

// REDEFINE: %{entry_point} = load_store_two_za_s_tiles
// RUN: %{compile} | %{translate} | %{run} | FileCheck %s
```


================
Comment at: mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir:246-247
+  // Allocate memory for two 32-bit element tiles.
+  %tilesize = arith.muli %svl_s, %svl_s : index
+  %size = arith.muli %tilesize, %c2_index : index
+  %mem1 = memref.alloca(%size) : memref<?xi32>
----------------
c-rhodes wrote:
> awarzynski wrote:
> > [nit] Small suggestion for a more descriptive name.
> > [nit] Small suggestion for a more descriptive name.
> 
> I understanding you intend that to mean to element type of the tile but appending type usually indicates the value is of that type which this isnt, it's an index, I've kept it as is.
> I understanding you intend that to mean to element type 

That's not the main thing that I had in mind :) Sorry should've been clearer.

My main suggestion here is to avoid variables names like e.g. `size` (or `name` etc). With names like this my first question would be "_size_ of what?" (or "_name_ of what?"). I just wanted to clarify, this is just a nit.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156689/new/

https://reviews.llvm.org/D156689



More information about the llvm-commits mailing list