[llvm] [X86] X86FixupVectorConstants - load+sign-extend vector constants that can be stored in a truncated form (PR #79815)
Simon Pilgrim via llvm-commits
llvm-commits at lists.llvm.org
Wed Jan 31 04:58:06 PST 2024
================
@@ -750,7 +750,7 @@ define void @vec128_i16_widen_to_i32_factor2_broadcast_to_v4i32_factor4(ptr %in.
; AVX512BW-SLOW-LABEL: vec128_i16_widen_to_i32_factor2_broadcast_to_v4i32_factor4:
; AVX512BW-SLOW: # %bb.0:
; AVX512BW-SLOW-NEXT: vmovdqa64 (%rdi), %zmm0
-; AVX512BW-SLOW-NEXT: vmovdqa {{.*#+}} xmm1 = [0,9,0,11,0,13,0,15]
+; AVX512BW-SLOW-NEXT: vpmovsxbw {{.*#+}} xmm1 = [0,9,0,11,0,13,0,15]
----------------
RKSimon wrote:
This is how I see it regarding preference between vzload/broadcast/vextload:
1. vzload shouldn't ever need a shuffle port to zero the upper elements and the fp/int domain versions are equally available so we don't introduce a domain crossing penalty.
2. broadcast sometimes need a shuffle port (especially for 8/16-bit variants), AVX1 only has fp domain broadcasts but AVX2+ have good fp/int domain equivalents
3. vextload always needs a shuffle port and is only ever int domain
https://github.com/llvm/llvm-project/pull/79815
More information about the llvm-commits
mailing list