[PATCH] D133116: [AArch64][SVE] Optimise repeated floating-point complex patterns to splat
Paul Walker via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Sep 23 10:18:04 PDT 2022
paulwalker-arm added inline comments.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:10786-10787
+// Simplify patterns like
+// INSERT_SUBVECTOR(BUILD_VECTOR(f32 a, f32 b, f32 a, f32 b))
+// to SPLAT_VECTOR(f64(a, b))
+SDValue useWideSplatForBuildVectorRepeatedComplexPattern(SDValue Op,
----------------
As mentioned below I don't believe this transform is save outside of the context of how it's used in the `AArch64ISD::DUPLANE128` case you care about. Hence my do it as a DAG combine suggestion.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:10802-10808
+ unsigned NumOperands = BuildVector.getNumOperands();
+ if (NumOperands < 4 || NumOperands % 2 != 0)
+ return SDValue();
+ for (unsigned i = 0; i < NumOperands / 2; i++) {
+ if (BuildVector.getOperand(i) != BuildVector.getOperand(i + 2))
+ return SDValue();
+ }
----------------
`ISD::BUILD_VECTOR` has its own class `BuildVectorSDNode` that provides `getRepeatedSequence()`, which can probably help here and perhaps mean we can cover more cases?
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:10817-10828
+ SDLoc DL(Op);
+ SDValue InsertVectorElt =
+ DAG.getNode(ISD::INSERT_VECTOR_ELT, DL, VecTy, DAG.getUNDEF(VecTy),
+ BuildVector.getOperand(1), DAG.getConstant(1, DL, MVT::i64));
+ SDValue WideVecBitcast =
+ DAG.getNode(ISD::BITCAST, DL, WideVecTy, InsertVectorElt);
+ SDValue ExtractWideVectorElt =
----------------
Not sure if I'm sending you down a dead end but I'm wondering if instead of this logic we've reach the stage where it's preferable to enable the other `AArch64ISD::DUPLANE##` nodes for scalable vectors. That way you're essentially doing:
```
AArch64ISD::DUPLANE128(INSERT_SUBVECTOR(BUILD_VECTOR(f32 a, f32 b, f32 a, f32 b))) ->
AArch64ISD::DUPLANE64(INSERT_SUBVECTOR(BUILD_VECTOR(f32 a, f32 b, f32 a, f32 b)))
```
And then if necessary we add combines to remove the unused parts of the subvector.
The problem with `ISD::SPLAT_VECTOR` is that is does what you want for floating point types but if you had integer tests you may well see FPR<->GPR transitions you don't want. Of course you can use bitcast to ensure everything looks like a floating point value but that might not be the cleanest approach.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:10851-10853
+ if (SDValue WideSplat =
+ useWideSplatForBuildVectorRepeatedComplexPattern(Op, DAG);
+ CIdxVal == 0 && WideSplat)
----------------
This is not a good place for this transformation as it's really an optimisation rather than anything related to lowering. I think you want a DAG combine for `AArch64ISD::DUPLANE128` because `useWideSplatForBuildVectorRepeatedComplexPattern` looks like a helper function rather than a correct optimisation in its own right.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D133116/new/
https://reviews.llvm.org/D133116
More information about the llvm-commits
mailing list