[llvm] [AArch64] Improve non-SVE popcount for 32bit and 64 bit using udot (PR #95881)
Tim Gymnich via llvm-commits
llvm-commits at lists.llvm.org
Tue Jun 18 06:13:54 PDT 2024
================
@@ -9740,6 +9740,26 @@ SDValue AArch64TargetLowering::LowerCTPOP_PARITY(SDValue Op,
Val = DAG.getBitcast(VT8Bit, Val);
Val = DAG.getNode(ISD::CTPOP, DL, VT8Bit, Val);
+ if (Subtarget->hasDotProd() && VT.getScalarSizeInBits() != 16) {
+ EVT DT = VT == MVT::v2i64 ? MVT::v4i32 : VT;
----------------
tgymnich wrote:
Excluded the v1i64 case. I am wondering why this case was not handled before? Code seems worse this way:
```
popcount64: // @popcount64
fmov d0, x0
cnt v0.8b, v0.8b
uaddlv h0, v0.8b
fmov w0, s0
ret
popcount1x64: // @popcount1x64
cnt v0.8b, v0.8b
uaddlp v0.4h, v0.8b
uaddlp v0.2s, v0.4h
uaddlp v0.1d, v0.2s
ret
```
https://github.com/llvm/llvm-project/pull/95881
More information about the llvm-commits
mailing list