[llvm] [AArch64] Lower scalable i1 vector add reduction to cntp (PR #99031)

Wed Jul 17 09:16:47 PDT 2024

================
@@ -0,0 +1,132 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+
+define i8 @uaddv_zexti8_nxv16i1(<vscale x 16 x i1> %v) {
+; CHECK-LABEL: uaddv_zexti8_nxv16i1:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    cntp x0, p0, p0.b
+; CHECK-NEXT:    // kill: def $w0 killed $w0 killed $x0
+; CHECK-NEXT:    ret
+entry:
+  %3 = zext <vscale x 16 x i1> %v to <vscale x 16 x i8>
----------------
DevM-uk wrote:

Extending to a `<vscale x 16 x i64>` results in the vector getting split into multiple `nxv2i64` vectors which don't get matched by this patch (so results in a lowering to a `UADDV_PRED`) as I have just covered the simple cases. Maybe I should make this clearer in the PR/commit message?

https://github.com/llvm/llvm-project/pull/99031