[PATCH] D128503: [AArch64][SVE] Lower aarch64_sve_dupq_lane to ld1rq

Tue Jun 28 07:36:57 PDT 2022

paulwalker-arm added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:10525-10526
+  if (LD1RQResult) {
+    if (LD1RQResult.getOpcode() == ISD::BUILD_VECTOR)
+      return SDValue();
+    return LD1RQResult;
----------------
peterwaller-arm wrote:
> As it stands, `tryLowerLD1RQ` is returning the `BUILD_VECTOR` as an in-band signalling mechanism, reusing the 'slot' that would ordinarily be used to communicate the result of the lowering to instead communicate that further lowering should stop for now. This seems hazardous to me.
> 
> I'd be interested if other reviewers have an opinion on what is a better way to achieve this. `tryLowerLD1RQ` wants its hands on the address of the load, which begins life as an `ISD::BUILD_VECTOR` and the address is not available until after `BUILD_VECTOR` has been lowered. However, `LowerDUPQLane` after this point can match `Op` and replace it with something else. The logic for converting the `BUILD_VECTOR` to a load cannot be directly called out to since it is within the lowering of `BUILD_VECTOR`, and reimplementing or refactoring that doesn't seem the right way either.
> 
> On balance I think the current approach is OK, although I'd prefer to see the signalling be out-of-band, perhaps via a pass-by-pointer bool.
I believe the problem here is that the current implementation of LowerDUPQLane is kicking out a MachineNode (i.e. `AArch64::DUP_ZZI_Q`) which means you cannot implement the fix by looking for the expected `splatq(load(q))` pattern.

I recommend first creating an `AArch64ISD` node for `DUP_ZZI_Q` or something akin to it if you think of something more generic.  Then `LowerDUPQLane` will only emit normal DAG nodes and thus you can implement a simpler DAG combine.  With that said, I believe once the AArch64ISD node exists you should be able to implement the optimisation within `AArch64SVEInstrInfo.td` much like how we target the other variants of `LD1R` albeit with a different pattern to match against.

It's probably worth breaking this into two patches.  The first to create and use the new ISD node and the second to add the patterns to target `LD1RQ`.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:10535
   // The DUPQ operation is indepedent of element type so normalise to i64s.
   SDValue V = DAG.getNode(ISD::BITCAST, DL, MVT::nxv2i64, Op.getOperand(1));
   SDValue Idx128 = Op.getOperand(2);
----------------
Not your fault but I believe this `BITCAST` is a bug waiting to hit big endian targets.  I recommend ensuring your new ISD node can be selected for all packed legal scalable vector types so the need for the bitcast is removed from the DUPQ path.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128503/new/

https://reviews.llvm.org/D128503