[llvm] [AArch64] Improve scalar and Neon popcount with SVE CNT. (PR #143870)
Ricardo Jesus via llvm-commits
llvm-commits at lists.llvm.org
Thu Jun 26 08:36:08 PDT 2025
================
@@ -577,11 +670,25 @@ define <8 x i16> @popcount8x16(<8 x i16> %0) {
; CHECKO0-NEXT: uaddlp v0.8h, v0.16b
; CHECKO0-NEXT: ret
;
-; CHECK-LABEL: popcount8x16:
-; CHECK: // %bb.0: // %Entry
-; CHECK-NEXT: cnt v0.16b, v0.16b
-; CHECK-NEXT: uaddlp v0.8h, v0.16b
-; CHECK-NEXT: ret
+; NEON-LABEL: popcount8x16:
+; NEON: // %bb.0: // %Entry
+; NEON-NEXT: cnt v0.16b, v0.16b
+; NEON-NEXT: uaddlp v0.8h, v0.16b
+; NEON-NEXT: ret
+;
+; DOT-LABEL: popcount8x16:
+; DOT: // %bb.0: // %Entry
+; DOT-NEXT: cnt v0.16b, v0.16b
+; DOT-NEXT: uaddlp v0.8h, v0.16b
+; DOT-NEXT: ret
+;
+; SVE-LABEL: popcount8x16:
+; SVE: // %bb.0: // %Entry
+; SVE-NEXT: ptrue p0.h, vl8
+; SVE-NEXT: // kill: def $q0 killed $q0 def $z0
+; SVE-NEXT: cnt z0.h, p0/m, z0.h
+; SVE-NEXT: // kill: def $q0 killed $q0 killed $z0
+; SVE-NEXT: ret
----------------
rj-jesus wrote:
I believe in most real-world scenarios the PTRUE should be negligible, either because it's materialised well in advance or because it gets pipelined with other instructions along the critical path.
In somewhat unrealistic loops such as
```gas
neon:
cnt v0.16b, v0.16b
uaddlp v0.8h, v0.16b
subs x0, x0, 1
b.ne neon
```
and
```gas
sve:
ptrue p0.h, vl8
cnt z0.h, p0/m, z0.h
subs x0, x0, 1
b.ne sve
```
the SVE version is 2x faster than the Neon version (on Neoverse V2) due to the shorter critical path.
In loops such as
```cpp
for (size_t i = 0; i < N; ++i)
x[i] = __builtin_popcountg(x[i]);
```
I see no difference between the two versions since the popcount isn't on the critical path (but presumably the SVE version would be preferable in real-world cases due to using the V pipes fewer times and "shortening" the latency of the popcount).
Do you have a specific case in mind that you're worried about? For what it's worth, GCC have implemented similar lowering a few months ago ([Neon](https://github.com/gcc-mirror/gcc/commit/e4b8db26de35239bd621aad9c0361f25d957122b) and [scalar](https://github.com/gcc-mirror/gcc/commit/9ffcf1f193b477f417a4c1960cd32696a23b99b4)).
https://github.com/llvm/llvm-project/pull/143870
More information about the llvm-commits
mailing list