[llvm] [X86] Fold (v4i32 (scalar_to_vector (i32 (zext (bitcast (f16)))))) -> (v4i32 bitcast (shuffle (v8f16 scalar_to_vector))) (PR #126033)

Phoebe Wang via llvm-commits llvm-commits at lists.llvm.org
Fri Feb 7 01:11:29 PST 2025


================
@@ -43,13 +43,11 @@ define void @v_test_canonicalize__half(half addrspace(1)* %out) nounwind {
 ;
 ; AVX512-LABEL: v_test_canonicalize__half:
 ; AVX512:       # %bb.0: # %entry
-; AVX512-NEXT:    movzwl (%rdi), %eax
-; AVX512-NEXT:    movzwl {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ecx
-; AVX512-NEXT:    vmovd %ecx, %xmm0
+; AVX512-NEXT:    vpinsrw $0, (%rdi), %xmm0, %xmm0
+; AVX512-NEXT:    vpmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero
 ; AVX512-NEXT:    vcvtph2ps %xmm0, %xmm0
-; AVX512-NEXT:    vmovd %eax, %xmm1
-; AVX512-NEXT:    vcvtph2ps %xmm1, %xmm1
-; AVX512-NEXT:    vmulss %xmm0, %xmm1, %xmm0
+; AVX512-NEXT:    vcvtph2ps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
----------------
phoebewang wrote:

The memory fold is greet, but I think `movzwl + vmovd` is better than `vpinsrw + vpmovzxwq`.

https://github.com/llvm/llvm-project/pull/126033


More information about the llvm-commits mailing list