[llvm] [AArch64] Optimize splat of extending loads to avoid GPR->FPR transfer (PR #163067)
Guy David via llvm-commits
llvm-commits at lists.llvm.org
Wed Oct 22 06:13:23 PDT 2025
================
@@ -4375,6 +4375,26 @@ def : Pat <(v1i64 (scalar_to_vector (i64
(load (ro64.Xpat GPR64sp:$Rn, GPR64:$Rm, ro64.Xext:$extend))))),
(LDRDroX GPR64sp:$Rn, GPR64:$Rm, ro64.Xext:$extend)>;
+// Patterns for scalar_to_vector with zero-extended loads.
+// Enables direct SIMD register loads for small integer types (i8/i16) that are
+// naturally zero-extended to i32/i64.
+multiclass ScalarToVectorExtLoad<ValueType VecTy, ValueType ScalarTy> {
+ def : Pat<(VecTy (scalar_to_vector (ScalarTy (zextloadi8 (am_indexed8 GPR64sp:$Rn, uimm12s1:$offset))))),
+ (SUBREG_TO_REG (i64 0), (LDRBui GPR64sp:$Rn, uimm12s1:$offset), bsub)>;
----------------
guy-david wrote:
Created a templated form which accepts both `scalar_to_vector` and `bitconvert` operations.
https://github.com/llvm/llvm-project/pull/163067
More information about the llvm-commits
mailing list