[PATCH] D120912: [AArch64][SVE] Convert gather/scatter with a stride of 2 to contiguous loads/stores

Mon Mar 7 08:34:23 PST 2022

david-arm added a comment.

Hi @kmclaughlin, this patch looks a lot better now thanks! Most of my remaining comments are minor, but I do think we should add some explicit checks for truncating scatters and extending gathers.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16603
+
+  auto Step = cast<ConstantSDNode>(Index->getOperand(0))->getSExtValue();
+
----------------
nit: Is it worth adding a comment here that `Index` should always be a STEP_VECTOR, which is the reason you are asumming the first operand is a ConstantSDNode?

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16614
+  else
+    return SDValue();
+
----------------
I think you may need some extra checks here for extending gather loads or truncating scatter stores and bail out for both cases, otherwise we may attempt to do the DAG combine for extloads of nxv4i32 -> nx4i64.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16623
+    SDValue PassThrough = MGT->getPassThru();
+    if (!PassThrough.isUndef())
+      return SDValue();
----------------
Given that you're creating new nodes above and then potentially dropping them it might be better to only create nodes once you're guaranteed to do the transformation, i.e. something like

  auto *MGT = dyn_cast<MaskedGatherSDNode>(MGS);
  if (MGT && !MGT->getPassThru().isUndef())
    return SDValue();

  // Using contiguous stores is not beneficial over 64 bit scatters
  auto *MSC = cast<MaskedScatterSDNode>(MGS);
  if (MSC && MSC->getValue().getValueType().getScalarSizeInBits() != 64)

  // Split the mask into two parts and interleave with false
  ...

  if (MGT) {
  .. gather case ..
  }

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16627
+    EVT VT = MGT->getValueType(0);
+    auto EltCount =
+         DAG.getVScale(DL, MVT::i64,
----------------
nit: I think you also calculate this for the scatter case. Maybe it's worth moving this to above the if statement and then re-using the value for both gather and scatter cases?

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16642
+                                   DAG.getUNDEF(BaseVT), PredH, PassThrough,
+                                   MemVT, MGT->getMemOperand(),
+                                   ISD::UNINDEXED, MGT->getExtensionType());
----------------
I'm not sure if you can simply reuse the same memory operand here, i.e. if you look at `DAGTypeLegalizer::SplitVecRes_MLOAD` you'll see an example of what we do for scalable vectors. I think it's because we're loading from a different offset now.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16684
+                                   DAG.getUNDEF(BaseVT), PredH, MemVT,
+                                   MSC->getMemOperand(), ISD::UNINDEXED,
+                                   MSC->isTruncatingStore());
----------------
Again, I think you need to create a new mem operand here.

================
Comment at: llvm/test/CodeGen/AArch64/sve-gather-scatter-to-contiguous.ll:6
+
+define <vscale x 4 x i32> @gather_stride2_4i32(i32* %addr, <vscale x 4 x i1> %mask) {
+; CHECK-LABEL: gather_stride2_4i32:
----------------
I think it's worth having a FP gather test and FP scatter test too, since we support them

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120912/new/

https://reviews.llvm.org/D120912