[llvm] [AArch64] Optimize DUP of extending loads to avoid GPR->FPR transfer (PR #163067)

Wed Oct 22 00:17:44 PDT 2025

================
@@ -4375,6 +4375,26 @@ def : Pat <(v1i64 (scalar_to_vector (i64
                (load (ro64.Xpat GPR64sp:$Rn, GPR64:$Rm, ro64.Xext:$extend))))),
            (LDRDroX GPR64sp:$Rn, GPR64:$Rm, ro64.Xext:$extend)>;
 
+// Patterns for scalar_to_vector with zero-extended loads.
+// Enables direct SIMD register loads for small integer types (i8/i16) that are
+// naturally zero-extended to i32/i64.
+multiclass ScalarToVectorExtLoad<ValueType VecTy, ValueType ScalarTy> {
+  def : Pat<(VecTy (scalar_to_vector (ScalarTy (zextloadi8 (am_indexed8 GPR64sp:$Rn, uimm12s1:$offset))))),
+            (SUBREG_TO_REG (i64 0), (LDRBui GPR64sp:$Rn, uimm12s1:$offset), bsub)>;
----------------
davemgreen wrote:

The problem with load patterns is that there are quite a few addressing modes / combinations that we should be supporting but too often do not add patterns for. The combination of all the types gets a bit out of hand. Some patterns should be considered "canonical" though, that we build others on top of.

Is there another basic form of loads we can base these on? If you try and use the extload+bitcast we added lately then those are incomplete (and look wrong to me, I'll make a patch). We could also consider scalar_to_vec(extload) as a base form, if so can you think of a nice templated way to make sure we add all the different addressing forms needed?

https://github.com/llvm/llvm-project/pull/163067