[llvm] [AArch64] Extend performActiveLaneMaskCombine for more than two extracts (PR #146725)
David Sherwood via llvm-commits
llvm-commits at lists.llvm.org
Thu Jul 3 06:54:55 PDT 2025
================
@@ -86,6 +86,64 @@ define void @test_boring_case_2x2bit_mask(i64 %i, i64 %n) #0 {
ret void
}
+define void @test_legal_4x2bit_mask(i64 %i, i64 %n) #0 {
+; CHECK-SVE-LABEL: test_legal_4x2bit_mask:
+; CHECK-SVE: // %bb.0:
+; CHECK-SVE-NEXT: whilelo p0.h, x0, x1
+; CHECK-SVE-NEXT: punpkhi p1.h, p0.b
+; CHECK-SVE-NEXT: punpklo p4.h, p0.b
+; CHECK-SVE-NEXT: punpkhi p3.h, p1.b
+; CHECK-SVE-NEXT: punpklo p2.h, p1.b
+; CHECK-SVE-NEXT: punpklo p0.h, p4.b
+; CHECK-SVE-NEXT: punpkhi p1.h, p4.b
+; CHECK-SVE-NEXT: b use
+;
+; CHECK-SVE2p1-SME2-LABEL: test_legal_4x2bit_mask:
+; CHECK-SVE2p1-SME2: // %bb.0:
+; CHECK-SVE2p1-SME2-NEXT: cntd x8
+; CHECK-SVE2p1-SME2-NEXT: adds x8, x0, x8
+; CHECK-SVE2p1-SME2-NEXT: csinv x8, x8, xzr, lo
+; CHECK-SVE2p1-SME2-NEXT: whilelo { p0.d, p1.d }, x0, x1
+; CHECK-SVE2p1-SME2-NEXT: whilelo { p2.d, p3.d }, x8, x1
+; CHECK-SVE2p1-SME2-NEXT: b use
+ %r = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 %i, i64 %n)
+ %v0 = call <vscale x 2 x i1> @llvm.vector.extract.nxv2i1.nxv8i1.i64(<vscale x 8 x i1> %r, i64 6)
+ %v1 = call <vscale x 2 x i1> @llvm.vector.extract.nxv2i1.nxv8i1.i64(<vscale x 8 x i1> %r, i64 4)
+ %v2 = call <vscale x 2 x i1> @llvm.vector.extract.nxv2i1.nxv8i1.i64(<vscale x 8 x i1> %r, i64 2)
+ %v3 = call <vscale x 2 x i1> @llvm.vector.extract.nxv2i1.nxv8i1.i64(<vscale x 8 x i1> %r, i64 0)
+ tail call void @use(<vscale x 2 x i1> %v3, <vscale x 2 x i1> %v2, <vscale x 2 x i1> %v1, <vscale x 2 x i1> %v0)
+ ret void
+}
+
+; Negative test where the extract types are correct but we are not extracting all parts of the mask
----------------
david-arm wrote:
It doesn't look like there is anything fundamental that stops us from using while pair here I think? For example, we could do something like
whilelo { p0.d, p1.d }, x0, x1
for extracts at indices 0 and 2, followed by a normal whilelo for the last part:
adds x8, ...
while p2.d, x8, x1
Not suggesting you do anything here, but perhaps explain in the comment that although we could generate a while pair here we don't really see a use case for it yet?
https://github.com/llvm/llvm-project/pull/146725
More information about the llvm-commits
mailing list