[PATCH] D142031: [AArch64][SME2] Add intrinsics to move multi-vectors to/from ZA.

Thu Jan 19 05:46:31 PST 2023

david-arm added inline comments.

================
Comment at: llvm/include/llvm/IR/IntrinsicsAArch64.td:2836
+
+  def int_aarch64_sme_read_hor_vg2   : SME2_Matrix_ArrayVector_Read_VG2_Intrinsic;
+  def int_aarch64_sme_read_hor_vg4   : SME2_Matrix_ArrayVector_Read_VG4_Intrinsic;
----------------
I wonder - given these are moving from a tile to a vector is it perhaps better named as something like

SME2_Matrix_TileVector_Read_VG2_Intrinsic
SME2_Matrix_TileVector_Read_VG4_Intrinsic
SME2_Matrix_TileVector_Write_VG2_Intrinsic
SME2_Matrix_TileVector_Write_VG4_Intrinsic

and the others are actually reading from the array so perhaps these can be

SME2_ZA_ArrayVector_Read_VG2_Intrinsic
SME2_ZA_ArrayVector_Read_VG4_Intrinsic
SME2_ZA_ArrayVector_Write_VG2_Intrinsic
SME2_ZA_ArrayVector_Write_VG4_Intrinsic

what do you think?

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:1734
+  case AArch64::ZAB0:
+    if (TileNum == 0) {
+      break;
----------------
I think this should probably be

  case AArch64::ZAB0:
    if (TileNum == 0)
      break;
    return false;
  case ...

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:1737
+      return false;
+    case AArch64::ZAH0:
+      if (TileNum <= 1)
----------------
This indenting here doesn't look right I think?

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:1760
+                                                unsigned TileNum, unsigned Op) {
+  SDValue SliceBase;
+  if (BaseReg == AArch64::ZA)
----------------
Maybe it's worth moving this code below the call to `SelectSMETile`, so it's close to where it's used?

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:1771
+  if (!SelectSMETileSlice(SliceBase, MaxIdx, Base, Offset, Scale))
+    llvm_unreachable("Invalid offset value");
+
----------------
I wonder if it's better to simply return here and let it crash with a selection error? The problem with `llvm_unreachable` I think is that for a release build it will be a nop and will silently do the wrong thing.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:4629
+    case Intrinsic::aarch64_sme_read_hor_vg2: {
+      unsigned TileNum =
+          cast<ConstantSDNode>(Node->getOperand(2))->getZExtValue();
----------------
This is just a thought so feel free to ignore it if you think it makes things worse! But I wondered if you could avoiding passing the `TileNum` here, since you're already passing in the Node anyway. That way in `SelectMultiVectorMove` you could do:

  unsigned TileNum = 0;
  if (BaseReg != AArch64::ZA)
    TileNum = cast<ConstantSDNode>(Node->getOperand(2))->getZExtValue();

================
Comment at: llvm/lib/Target/AArch64/SMEInstrFormats.td:795
                                        ComplexPattern tileslice> {
-  def : Pat<(op imm_ty:$tile, MatrixIndexGPR32Op12_15:$idx,
                 (ppr_vt PPR3bAny:$pg), (zpr_vt ZPRAny:$zn)),
----------------
This wasn't mentioned in the commit message, but it looks like you're simplifying the patterns here because you can always use tileslice to get you base + offset, even if offset = 0? It's a nice clean-up!

================
Comment at: llvm/test/CodeGen/AArch64/sme2-intrinsics-extract-mova.ll:96
+  %res = call { <vscale x 2 x i64>, <vscale x 2 x i64> } @llvm.aarch64.sme.read.hor.vg2.nxv2i64(i32 0, i32 %slice)
+  %slice.0 = add i32 %slice, 0
+  %res2 = call { <vscale x 2 x i64>, <vscale x 2 x i64> } @llvm.aarch64.sme.read.hor.vg2.nxv2i64(i32 7, i32 %slice.0)
----------------
This add doesn't seem to 'add' any value, if you excuse the pun. :) And the same thing for the other 64-bit variants.

================
Comment at: llvm/test/CodeGen/AArch64/sme2-intrinsics-extract-mova.ll:286
+  %res = call { <vscale x 4 x i32>, <vscale x 4 x i32>, <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.aarch64.sme.read.hor.vg4.nxv4i32(i32 0, i32 %slice)
+  %slice.0 = add i32 %slice, 0
+  %res2 = call { <vscale x 4 x i32>, <vscale x 4 x i32>, <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.aarch64.sme.read.hor.vg4.nxv4i32(i32 3, i32 %slice.0)
----------------
Again, more adds of 0 here.

================
Comment at: llvm/test/CodeGen/AArch64/sme2-intrinsics-insert-mova.ll:110
+  call void @llvm.aarch64.sme.write.hor.vg2.nxv2i64(i32 0, i32 %slice, <vscale x 2 x i64> %zn1, <vscale x 2 x i64> %zn2)
+  %slice.0 = add i32 %slice, 0
+  call void @llvm.aarch64.sme.write.hor.vg2.nxv2i64(i32 7, i32 %slice.0, <vscale x 2 x i64> %zn1, <vscale x 2 x i64> %zn2)
----------------
Again, could you remove the adds of 0 from this test file too?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142031/new/

https://reviews.llvm.org/D142031