[llvm] [AArch64] Optimise test of the LSB of a paired whileCC instruction (PR #81141)

Wed Jun 19 02:00:44 PDT 2024

================
@@ -0,0 +1,97 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc                < %s | FileCheck %s
+; RUN: llc -mattr=+sve2p1 < %s | FileCheck %s --check-prefix=CHECK-SVE2p1
+target triple = "aarch64-linux"
+
+define void @f_while(i32 %i, i32 %n) #0 {
+; CHECK-LABEL: f_while:
+; CHECK:       // %bb.0: // %E
+; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-NEXT:    whilelo p0.b, w0, w1
+; CHECK-NEXT:    b.pl .LBB0_2
+; CHECK-NEXT:  // %bb.1: // %A
+; CHECK-NEXT:    bl g0
+; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT:    ret
+; CHECK-NEXT:  .LBB0_2: // %B
+; CHECK-NEXT:    bl g1
+; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT:    ret
+;
+; CHECK-SVE2p1-LABEL: f_while:
+; CHECK-SVE2p1:       // %bb.0: // %E
+; CHECK-SVE2p1-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-SVE2p1-NEXT:    whilelo p0.b, w0, w1
+; CHECK-SVE2p1-NEXT:    b.pl .LBB0_2
+; CHECK-SVE2p1-NEXT:  // %bb.1: // %A
+; CHECK-SVE2p1-NEXT:    bl g0
+; CHECK-SVE2p1-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-SVE2p1-NEXT:    ret
+; CHECK-SVE2p1-NEXT:  .LBB0_2: // %B
+; CHECK-SVE2p1-NEXT:    bl g1
+; CHECK-SVE2p1-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-SVE2p1-NEXT:    ret
+E:
+  %wide.mask = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i32 %i, i32 %n)
+  %mask = call <vscale x 8 x i1> @llvm.vector.extract.nxv8i1.nxv16i1(<vscale x 16 x i1> %wide.mask, i64 0)
+  %elt = extractelement <vscale x 8 x i1> %mask, i64 0
----------------
sdesmalen-arm wrote:

If we reduce the whole `%wide.mask` into an `i1` and branch based on that, we already seem to fold away the `ptest`, see [example](https://godbolt.org/z/fK7817xKK). What is the use-case for extracting the first element as opposed to reducing the whole vector?

(The case of folding away the ptest when reducing the [partial vector](https://godbolt.org/z/qc8qWWKvr) is not yet handled)

https://github.com/llvm/llvm-project/pull/81141