[llvm] [AArch64] Lower scalable i1 vector add reduction to cntp (PR #99031)
Sander de Smalen via llvm-commits
llvm-commits at lists.llvm.org
Thu Jul 18 09:20:08 PDT 2024
================
@@ -27469,6 +27469,21 @@ SDValue AArch64TargetLowering::LowerReductionToSVE(unsigned Opcode,
VecOp = convertToScalableVector(DAG, ContainerVT, VecOp);
}
+ // Lower VECREDUCE_ADD of nxv2i1-nxv16i1 to CNTP rather than UADDV.
+ if (ScalarOp.getOpcode() == ISD::VECREDUCE_ADD &&
+ VecOp.getOpcode() == ISD::ZERO_EXTEND) {
+ SDValue Vec = VecOp.getOperand(0);
+ EVT VecVT = Vec.getValueType();
+ if (VecVT.getVectorElementType() == MVT::i1) {
----------------
sdesmalen-arm wrote:
Ah of course, I didn't realise this was in a Lower* function, rather than a DAG combine.
After splitting <vscale x 32 x i1>, I guess it could also use two `cntp` instructions (and add the results together). I guess this is more relevant for the extend from `<vscale x 16 x i1> -> <vscale x 16 x i64>` which generates a lot of punpklo/hi instructions, but it all depends on what code the LoopVectorizer generates.
https://github.com/llvm/llvm-project/pull/99031
More information about the llvm-commits
mailing list