[llvm] [AArch64] Optimise test of the LSB of a paired whileCC instruction (PR #81141)
Sander de Smalen via llvm-commits
llvm-commits at lists.llvm.org
Wed Jun 19 02:00:44 PDT 2024
================
@@ -0,0 +1,97 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s | FileCheck %s
+; RUN: llc -mattr=+sve2p1 < %s | FileCheck %s --check-prefix=CHECK-SVE2p1
+target triple = "aarch64-linux"
+
+define void @f_while(i32 %i, i32 %n) #0 {
+; CHECK-LABEL: f_while:
+; CHECK: // %bb.0: // %E
+; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-NEXT: whilelo p0.b, w0, w1
+; CHECK-NEXT: b.pl .LBB0_2
+; CHECK-NEXT: // %bb.1: // %A
+; CHECK-NEXT: bl g0
+; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: ret
+; CHECK-NEXT: .LBB0_2: // %B
+; CHECK-NEXT: bl g1
+; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: ret
+;
+; CHECK-SVE2p1-LABEL: f_while:
+; CHECK-SVE2p1: // %bb.0: // %E
+; CHECK-SVE2p1-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-SVE2p1-NEXT: whilelo p0.b, w0, w1
+; CHECK-SVE2p1-NEXT: b.pl .LBB0_2
+; CHECK-SVE2p1-NEXT: // %bb.1: // %A
+; CHECK-SVE2p1-NEXT: bl g0
+; CHECK-SVE2p1-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-SVE2p1-NEXT: ret
+; CHECK-SVE2p1-NEXT: .LBB0_2: // %B
+; CHECK-SVE2p1-NEXT: bl g1
+; CHECK-SVE2p1-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-SVE2p1-NEXT: ret
+E:
+ %wide.mask = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i32 %i, i32 %n)
+ %mask = call <vscale x 8 x i1> @llvm.vector.extract.nxv8i1.nxv16i1(<vscale x 16 x i1> %wide.mask, i64 0)
+ %elt = extractelement <vscale x 8 x i1> %mask, i64 0
----------------
sdesmalen-arm wrote:
If we reduce the whole `%wide.mask` into an `i1` and branch based on that, we already seem to fold away the `ptest`, see [example](https://godbolt.org/z/fK7817xKK). What is the use-case for extracting the first element as opposed to reducing the whole vector?
(The case of folding away the ptest when reducing the [partial vector](https://godbolt.org/z/qc8qWWKvr) is not yet handled)
https://github.com/llvm/llvm-project/pull/81141
More information about the llvm-commits
mailing list