[PATCH] D133116: [AArch64][SVE] Optimise repeated floating-point complex patterns to splat

Fri Sep 23 10:18:04 PDT 2022

paulwalker-arm added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:10786-10787
+// Simplify patterns like
+// INSERT_SUBVECTOR(BUILD_VECTOR(f32 a, f32 b, f32 a, f32 b))
+// to SPLAT_VECTOR(f64(a, b))
+SDValue useWideSplatForBuildVectorRepeatedComplexPattern(SDValue Op,
----------------
As mentioned below I don't believe this transform is save outside of the context of how it's used in the `AArch64ISD::DUPLANE128` case you care about. Hence my do it as a DAG combine suggestion.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:10802-10808
+  unsigned NumOperands = BuildVector.getNumOperands();
+  if (NumOperands < 4 || NumOperands % 2 != 0)
+    return SDValue();
+  for (unsigned i = 0; i < NumOperands / 2; i++) {
+    if (BuildVector.getOperand(i) != BuildVector.getOperand(i + 2))
+      return SDValue();
+  }
----------------
`ISD::BUILD_VECTOR` has its own class `BuildVectorSDNode` that provides `getRepeatedSequence()`, which can probably help here and perhaps mean we can cover more cases?

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:10817-10828
+  SDLoc DL(Op);
+  SDValue InsertVectorElt =
+      DAG.getNode(ISD::INSERT_VECTOR_ELT, DL, VecTy, DAG.getUNDEF(VecTy),
+                  BuildVector.getOperand(1), DAG.getConstant(1, DL, MVT::i64));
+  SDValue WideVecBitcast =
+      DAG.getNode(ISD::BITCAST, DL, WideVecTy, InsertVectorElt);
+  SDValue ExtractWideVectorElt =
----------------
Not sure if I'm sending you down a dead end but I'm wondering if instead of this logic we've reach the stage where it's preferable to enable the other  `AArch64ISD::DUPLANE##` nodes for scalable vectors.  That way you're essentially doing:
```
AArch64ISD::DUPLANE128(INSERT_SUBVECTOR(BUILD_VECTOR(f32 a, f32 b, f32 a, f32 b))) ->
AArch64ISD::DUPLANE64(INSERT_SUBVECTOR(BUILD_VECTOR(f32 a, f32 b, f32 a, f32 b)))
```
And then if necessary we add combines to remove the unused parts of the subvector.

The problem with `ISD::SPLAT_VECTOR` is that is does what you want for floating point types but if you had integer tests you may well see FPR<->GPR transitions you don't want.  Of course you can use bitcast to ensure everything looks like a floating point value but that might not be the cleanest approach.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:10851-10853
+      if (SDValue WideSplat =
+              useWideSplatForBuildVectorRepeatedComplexPattern(Op, DAG);
+          CIdxVal == 0 && WideSplat)
----------------
This is not a good place for this transformation as it's really an optimisation rather than anything related to lowering.  I think you want a DAG combine for `AArch64ISD::DUPLANE128` because `useWideSplatForBuildVectorRepeatedComplexPattern` looks like a helper function rather than a correct optimisation in its own right.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133116/new/

https://reviews.llvm.org/D133116