[llvm] [AArch64] Improve non-SVE popcount for 32bit and 64 bit using udot (PR #95881)

Tim Gymnich via llvm-commits llvm-commits at lists.llvm.org
Tue Jun 18 06:13:54 PDT 2024


================
@@ -9740,6 +9740,26 @@ SDValue AArch64TargetLowering::LowerCTPOP_PARITY(SDValue Op,
   Val = DAG.getBitcast(VT8Bit, Val);
   Val = DAG.getNode(ISD::CTPOP, DL, VT8Bit, Val);
 
+  if (Subtarget->hasDotProd() && VT.getScalarSizeInBits() != 16) {
+    EVT DT = VT == MVT::v2i64 ? MVT::v4i32 : VT;
----------------
tgymnich wrote:

Excluded the v1i64 case. I am wondering why this case was not handled before? Code seems worse this way:

```
popcount64:                             // @popcount64
        fmov    d0, x0
        cnt     v0.8b, v0.8b
        uaddlv  h0, v0.8b
        fmov    w0, s0
        ret
popcount1x64:                           // @popcount1x64
        cnt     v0.8b, v0.8b
        uaddlp  v0.4h, v0.8b
        uaddlp  v0.2s, v0.4h
        uaddlp  v0.1d, v0.2s
        ret
```

https://github.com/llvm/llvm-project/pull/95881


More information about the llvm-commits mailing list