[PATCH] D155299: [AArch64][SVE2] Combine add+lsr to rshrnb for stores

Thu Aug 3 14:12:29 PDT 2023

dmgreen added a comment.

Thanks. This looks good, but it might need to be using 1ULL for the shift value and there are few extra suggestions below. Otherwise LGTM

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:20085
+                                         const AArch64Subtarget *Subtarget) {
+  EVT VT = Srl->getValueType(0);
+  if (!Subtarget->hasSVE2() || Srl->getOpcode() != ISD::SRL ||
----------------
Can you either add a check that the VT is one that is valid for RSHRNB, or an assert that we only handle those types?

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:20105
+
+  if (AddValue != 1 << (ShiftValue - 1))
+    return SDValue();
----------------
This may need to be `1ULL << (ShiftValue - 1)`, as the number could be a 64bit value.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:20109-20111
+  auto Rshrnb = DAG.getNode(
+      AArch64ISD::RSHRNB_I, DL, VT,
+      {Add->getOperand(0), DAG.getTargetConstant(ShiftValue, DL, MVT::i32)});
----------------
```
SDValue Rshrnb = DAG.getNode(
      AArch64ISD::RSHRNB_I, DL, VT,
      Add->getOperand(0), DAG.getTargetConstant(ShiftValue, DL, MVT::i32));
```
(Or just return it directly)

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:20150
+  if (SDValue Rshrnb =
+          trySimplifySrlAddToRshrnb(cast<SDNode>(Op0), DAG, Subtarget))
+    return DAG.getNode(AArch64ISD::UZP1, DL, ResVT, Rshrnb, Op1);
----------------
trySimplifySrlAddToRshrnb could take a SDValue, instead of needing the cast.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:20776
+            cast<SDNode>(ST->getOperand(1)), DAG, Subtarget)) {
+      EVT RshrnbVT = Rshrnb.getValueType();
+      EVT StoreVT = ST->getMemoryVT();
----------------
I think RshrnbVT is just ValueVT?

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.h:217
   URSHR_I,
+  RSHRNB_I,

----------------
This might deserve to be in it's own little section. There may be a number of T/B nodes we end up adding, if we treat them in the same way.

================
Comment at: llvm/lib/Target/AArch64/SVEInstrFormats.td:4309
+
+  def : SVE_2_Op_Imm_Pat<nxv8i16, op, nxv8i16, i32, tvecshiftR8,  !cast<Instruction>(NAME # _B)>;
+  def : SVE_2_Op_Imm_Pat<nxv4i32, op, nxv4i32, i32, tvecshiftR16, !cast<Instruction>(NAME # _H)>;
----------------
It might be better to generate a nxv16i8->nxv8i16 RSHRN with a bitcast/nvcast back to nxv8i16.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155299/new/

https://reviews.llvm.org/D155299