[libc-commits] [clang] [flang] [compiler-rt] [clang-tools-extra] [libc] [libcxx] [lldb] [llvm] [X86][BF16] Try to use `f16` for lowering (PR #76901)

Thu Jan 4 22:35:54 PST 2024

================
@@ -22,10 +22,7 @@ define void @add(ptr %pa, ptr %pb, ptr %pc) nounwind {
 ; X86-NEXT:    vaddss %xmm0, %xmm1, %xmm0
 ; X86-NEXT:    vmovss %xmm0, (%esp)
 ; X86-NEXT:    calll __truncsfbf2
-; X86-NEXT:    fstps {{[0-9]+}}(%esp)
-; X86-NEXT:    vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; X86-NEXT:    vmovd %xmm0, %eax
-; X86-NEXT:    movw %ax, (%esi)
+; X86-NEXT:    vmovsh %xmm0, (%esi)
----------------
phoebewang wrote:

`vmovsh` can store the low 16-bit to memory directly.
The original codes has ABI mistake, which store `bf16` without `f32`. Since `f32` uses X87 registers on 32-bit target, it need to store to another memory first, reload to store again.
The patch makes the result in XMM0, so one `vmovsh` is enough.

https://github.com/llvm/llvm-project/pull/76901