[PATCH] D117900: [AArch64][SVE] Fold gather/scatter with 32bits when possible

Mon Jan 31 06:57:32 PST 2022

sdesmalen added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16155
+// their recommended values.
+static bool FindMoreOptimalIndexType(const MaskedGatherScatterSDNode *N,
+                                     SDValue &BasePtr, SDValue &Index,
----------------
nit: functions should start with lower case, i.e. `findMoreOptimalIndexType`.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16159
+                                     SelectionDAG &DAG) {
+
+  // Only consider element types that are pointer sized as smaller types can
----------------
nit: redundant newline.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16174
+      Stride =
+          dyn_cast<ConstantSDNode>(StepVector.getOperand(0))->getSExtValue();
+      Offset = DAG.getNode(ISD::MUL, DL, MVT::i64, Offset, N->getScale());
----------------
s/dyn_cast/cast/

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16180-16187
+  // Earlier returns because there is no fold change.
+  if (Stride == 0)
+    return false;
+
+  // Ensure all element can be represented as a signed i32 value.
+  if (Stride < std::numeric_limits<int32_t>::min() ||
+      Stride > std::numeric_limits<int32_t>::max())
----------------
nit: maybe just merge these conditions together:

  if (!Stride || Stride < std::numeric_limits<int32_t>::min() ||
      Stride > std::numeric_limits<int32_t>::max())
    return false?

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16225
+  // the use of an Index that's more legalisation friendly.
+  if (FindMoreOptimalIndexType(MGS, BasePtr, Index, IndexType, DAG)) {
+    if (auto *MGT = dyn_cast<MaskedGatherSDNode>(MGS)) {
----------------
nit: you can remove one more level of indentation, by writing it like this:

  if (!FindMoreOptimalIndexType(MGS, BasePtr, Index, IndexType, DAG))
    return SDValue();

  if (auto *MGT = ...) {
    ...
    return ...
  } else {
    ...
    return ...
  }

  llvm_unreachable("Unhandled case");

================
Comment at: llvm/test/CodeGen/AArch64/sve-gather-scatter-addr-opts.ll:3
+; RUN: llc < %s -mtriple=aarch64-linux-unknown | FileCheck %s
+
+; As scatter_f16_index_offset_8 but with a variable stride that we cannot prove
----------------
nit: can you group the positive test cases together (`scatter_i8_index_offset_minimum` and `scatter_i8_index_offset_maximum`), followed by the negative test cases?

================
Comment at: llvm/test/CodeGen/AArch64/sve-gather-scatter-addr-opts.ll:4
+
+; As scatter_f16_index_offset_8 but with a variable stride that we cannot prove
+; will not wrap when shrunk to be i32 based.
----------------
This test does not exist in this patch?

================
Comment at: llvm/test/CodeGen/AArch64/sve-gather-scatter-addr-opts.ll:37
+; Ensure we use a "vscale x 4" wide scatter for the maximum supported offset.
+define void @scatter_i8_index_offset_maximum(i8* %base, i64 %offset, <vscale x 4 x i1> %pg, <vscale x 4 x i8> %data) #0 {
+; CHECK-LABEL: scatter_i8_index_offset_maximum:
----------------
both positive tests use `i8` as types, can you also use a different type (e.g. i16/i32), so that we can check the scaling happens correctly? (`sxtw #1` or `sxtw #2`)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D117900/new/

https://reviews.llvm.org/D117900