[libc-commits] [clang] [flang] [compiler-rt] [clang-tools-extra] [libc] [libcxx] [lldb] [llvm] [X86][BF16] Try to use `f16` for lowering (PR #76901)
Phoebe Wang via libc-commits
libc-commits at lists.llvm.org
Thu Jan 4 22:35:54 PST 2024
================
@@ -22,10 +22,7 @@ define void @add(ptr %pa, ptr %pb, ptr %pc) nounwind {
; X86-NEXT: vaddss %xmm0, %xmm1, %xmm0
; X86-NEXT: vmovss %xmm0, (%esp)
; X86-NEXT: calll __truncsfbf2
-; X86-NEXT: fstps {{[0-9]+}}(%esp)
-; X86-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; X86-NEXT: vmovd %xmm0, %eax
-; X86-NEXT: movw %ax, (%esi)
+; X86-NEXT: vmovsh %xmm0, (%esi)
----------------
phoebewang wrote:
`vmovsh` can store the low 16-bit to memory directly.
The original codes has ABI mistake, which store `bf16` without `f32`. Since `f32` uses X87 registers on 32-bit target, it need to store to another memory first, reload to store again.
The patch makes the result in XMM0, so one `vmovsh` is enough.
https://github.com/llvm/llvm-project/pull/76901
More information about the libc-commits
mailing list