[llvm] [AArch64] Improve lowering for scalable masked deinterleaving loads (PR #154338)
Cullen Rhodes via llvm-commits
llvm-commits at lists.llvm.org
Thu Aug 28 04:03:00 PDT 2025
================
@@ -0,0 +1,460 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+
+define <vscale x 16 x i8> @foo_ld2_nxv16i8(<vscale x 16 x i1> %mask, ptr %p) {
+; CHECK-LABEL: foo_ld2_nxv16i8:
+; CHECK: // %bb.0:
+; CHECK-NEXT: ld2b { z0.b, z1.b }, p0/z, [x0]
+; CHECK-NEXT: add z0.b, z0.b, z1.b
+; CHECK-NEXT: ret
+ %interleaved.mask = call <vscale x 32 x i1> @llvm.vector.interleave2.nxv32i1(<vscale x 16 x i1> %mask, <vscale x 16 x i1> %mask)
+ %wide.masked.vec = call <vscale x 32 x i8> @llvm.masked.load.nxv32i8(ptr %p, i32 1, <vscale x 32 x i1> %interleaved.mask, <vscale x 32 x i8> poison)
+ %deinterleaved.vec = call { <vscale x 16 x i8>, <vscale x 16 x i8> } @llvm.vector.deinterleave2.nxv32i8(<vscale x 32 x i8> %wide.masked.vec)
+ %part1 = extractvalue { <vscale x 16 x i8>, <vscale x 16 x i8> } %deinterleaved.vec, 0
+ %part2 = extractvalue { <vscale x 16 x i8>, <vscale x 16 x i8> } %deinterleaved.vec, 1
+ %add = add <vscale x 16 x i8> %part1, %part2
+ ret <vscale x 16 x i8> %add
----------------
c-rhodes wrote:
the extracts + add aren't necessary for these tests, I think you should just return the tuple like we do in `llvm/test/CodeGen/AArch64/sve-vector-deinterleave.ll`
https://github.com/llvm/llvm-project/pull/154338
More information about the llvm-commits
mailing list