[PATCH] D129758: [AArch64][SVE] Lower DUPLANE128 to LD1RQD

Fri Jul 15 08:29:42 PDT 2022

paulwalker-arm added a comment.

Sorry, I rushed to suggest a code improvements without first verifying the correctness of the patches intent.

================
Comment at: llvm/include/llvm/Target/TargetSelectionDAG.td:709
+def vector_insert_subvec : SDNode<"ISD::INSERT_SUBVECTOR",
+    SDTypeProfile<1, 3, [SDTCisVec<0>, SDTCisVec<1>, SDTCisVec<2>, SDTCisInt<3>]>,
+    []>;
----------------
Can this be `SDTCisSameAs<0, 1>`

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:19183
+  EVT VT = N->getValueType(0);
+  EVT NewVT = VT.changeTypeToInteger();
+
----------------
The goal here is to replace `duplane128(insert_subvector(x, bitcast(y), idx1), idx2)` with `bitcast(duplane128(insert_subvector(new_x,y,idx),idx)` but that is only safe under specific instances of `insert_subvector`.

It's critical that `y` is a 128bit fixed length vector and `idx1==idx2`.  Given the use of `DAG.getUNDEF` you also require `x` to be `undef`.

With these requirements in place I think `NewVT` becomes `getPackedSVEVectorVT(y->getValueType()->getVectorElementType())`.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:19191-19192
+  SDValue Load = Bitcast.getOperand(0);
+  if (Load.getOpcode() != ISD::LOAD)
+    return SDValue();
+
----------------
I still don't know why this matters?

================
Comment at: llvm/lib/Target/AArch64/SVEInstrFormats.td:6909-6910
+
+  def : Pat<(vt1 (AArch64duplane128 (vt1 (vector_insert_subvec (vt1 undef), (vt2 (load GPR64sp:$Xn)), (i64 0))), (i64 0))),
+            (!cast<Instruction>(NAME) (pred 31), GPR64sp:$Xn, 0)>;
 }
----------------
Loads and store are the exception to the rule when it comes to adding patterns to the multiclass.  You'll see this with the scalar versions of ld1r.  The reason being the B,H,S,D forms are not hidden with the multiclass like they are for say the arithmetic instructions.  Having the patterns outside (i.e. within AArch64InstrInfo.td) means we can handle the floating point operations as well as make optimal use of the addressing modes.

================
Comment at: llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll:592-594
   %1 = tail call fast <vscale x 2 x double> @llvm.vector.insert.nxv2f64.v2f64(<vscale x 2 x double> undef, <2 x double> <double 1.000000e+00, double 2.000000e+00>, i64 0)
   %2 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.dupq.lane.nxv2f64(<vscale x 2 x double> %1, i64 0)
   ret <vscale x 2 x double> %2
----------------
Based on the patch I think you should simplify all `ld1rq#` tests by removing the constants and just have the test load the data explicitly.  This will also help in the future if there turns out to be a better way to compute the constants vectors these tests are doing.  So for example:
```
define dso_local <vscale x 2 x double> @dupq_ld1rqd_f64(ptr %a) {
  %1 = load <2 x double>, ptr %a
  %2 = tail call fast <vscale x 2 x double> @llvm.vector.insert.nxv2f64.v2f64(<vscale x 2 x double> undef, <2 x double> %1, i64 0)
  %3 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.dupq.lane.nxv2f64(<vscale x 2 x double> %2, i64 0)
  ret <vscale x 2 x double> %3
}
``` 
I also believe the tests are better placed in `sve-ld1r.ll`.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129758/new/

https://reviews.llvm.org/D129758