[llvm] [AArch64] Optimize splat of extending loads to avoid GPR->FPR transfer (PR #163067)

Thu Oct 23 12:10:39 PDT 2025

================
@@ -0,0 +1,139 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=aarch64-none-linux-gnu < %s | FileCheck %s
+
+; Test optimization of DUP with extended narrow loads
+; This should avoid GPR->SIMD transfers by loading directly into vector registers
+
+define <4 x i32> @test_dup_zextload_i8_v4i32(ptr %p) {
----------------
guy-david wrote:

No- and the implementation didn't optimize that too apparently because it exempted `i32` as a source operand.
Added handling for `i32` -> `v2i64` and a test.

https://github.com/llvm/llvm-project/pull/163067