[llvm] [AArch64] Prevent unnecessary truncation in bool vector reduce code generation (PR #120096)

Tue Dec 17 15:52:26 PST 2024

================
@@ -174,12 +174,12 @@ define i64 @extract_last_i64(<2 x i64> %data, <2 x i64> %mask, i64 %passthru) {
 ; SVE-FIXED-NEXT:    sub sp, sp, #16
 ; SVE-FIXED-NEXT:    .cfi_def_cfa_offset 16
 ; SVE-FIXED-NEXT:    cmtst v1.2d, v1.2d, v1.2d
-; SVE-FIXED-NEXT:    index z2.s, #0, #1
+; SVE-FIXED-NEXT:    index z3.s, #0, #1
 ; SVE-FIXED-NEXT:    mov x9, sp
 ; SVE-FIXED-NEXT:    str q0, [sp]
-; SVE-FIXED-NEXT:    xtn v1.2s, v1.2d
-; SVE-FIXED-NEXT:    and v2.8b, v1.8b, v2.8b
-; SVE-FIXED-NEXT:    umaxp v1.2s, v1.2s, v1.2s
+; SVE-FIXED-NEXT:    xtn v2.2s, v1.2d
+; SVE-FIXED-NEXT:    umaxv s1, v1.4s
----------------
efriedma-quic wrote:

Oh, I see, we got lucky with instruction reuse before... and if we don't get the reuse, S-form umaxv is faster than xtn+umaxp?  In that case, I guess this change is okay.

https://github.com/llvm/llvm-project/pull/120096