[llvm] [LLVM][CodeGen][AArch64] Lower vector-(de)interleave to multi-register uzp/zip instructions. (PR #143128)
Benjamin Maxwell via llvm-commits
llvm-commits at lists.llvm.org
Mon Jun 9 02:06:41 PDT 2025
================
@@ -29210,6 +29210,28 @@ AArch64TargetLowering::LowerVECTOR_DEINTERLEAVE(SDValue Op,
assert(OpVT.isScalableVector() &&
"Expected scalable vector in LowerVECTOR_DEINTERLEAVE.");
+ // Are multi-register uzp instructions available?
+ if (Subtarget->hasSME2() && Subtarget->isStreaming() &&
+ OpVT.getVectorElementType() != MVT::i1) {
+ Intrinsic::ID IntID;
+ switch (Op->getNumOperands()) {
+ default:
+ return SDValue();
+ case 2:
+ IntID = Intrinsic::aarch64_sve_uzp_x2;
+ break;
+ case 4:
+ IntID = Intrinsic::aarch64_sve_uzp_x4;
+ break;
----------------
MacDue wrote:
I think for the x4 uzp/zip a vector length of at least 256-bit is required for the i64/double elements.
See: https://developer.arm.com/documentation/ddi0602/2025-03/SME-Instructions/UZP--four-registers---Concatenate-elements-from-four-vectors-
> if size == '11' && [MaxImplementedSVL](https://developer.arm.com/documentation/ddi0602/2025-03/Shared-Pseudocode/aarch64-functions-sve?lang=en#impl-aarch64.MaxImplementedSVL.0)() < 256 then [EndOfDecode](https://developer.arm.com/documentation/ddi0602/2025-03/Shared-Pseudocode/shared-functions-decode?lang=en#impl-shared.EndOfDecode.1)([Decode_UNDEF](https://developer.arm.com/documentation/ddi0602/2025-03/Shared-Pseudocode/shared-functions-decode?lang=en#Decode_UNDEF));
https://github.com/llvm/llvm-project/pull/143128
More information about the llvm-commits
mailing list