[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

LuoYuanke via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jun 8 07:33:19 PDT 2022


LuoYuanke added inline comments.


================
Comment at: llvm/test/Analysis/CostModel/X86/fptoi_sat.ll:852
+; SSE2-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %f16u1 = call i1 @llvm.fptoui.sat.i1.f16(half undef)
+; SSE2-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %f16s8 = call i8 @llvm.fptosi.sat.i8.f16(half undef)
+; SSE2-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %f16u8 = call i8 @llvm.fptoui.sat.i8.f16(half undef)
----------------
It seems the cost is reduced in general. Is it because we pass/return f16 by xmm register?


================
Comment at: llvm/test/CodeGen/MIR/X86/inline-asm-registers.mir:31
   ; CHECK-LABEL: name: test
-  ; CHECK: INLINEASM &foo, 0 /* attdialect */, 4390922 /* regdef:GR64 */, def $rsi, 4390922 /* regdef:GR64 */, def dead $rdi,
-    INLINEASM &foo, 0, 4390922, def $rsi, 4390922, def dead $rdi, 2147549193, killed $rdi, 2147483657, killed $rsi, 12, implicit-def dead early-clobber $eflags
+  ; CHECK: INLINEASM &foo, 0 /* attdialect */, 4456458 /* regdef:GR64 */, def $rsi, 4456458 /* regdef:GR64 */, def dead $rdi,
+    INLINEASM &foo, 0, 4456458, def $rsi, 4456458, def dead $rdi, 2147549193, killed $rdi, 2147483657, killed $rsi, 12, implicit-def dead early-clobber $eflags
----------------
Why f16 patch affect this test case? There is no fp instruction in this test case.


================
Comment at: llvm/test/CodeGen/X86/atomic-non-integer.ll:253
+; X64-SSE-NEXT:    movzwl (%rdi), %eax
+; X64-SSE-NEXT:    pinsrw $0, %eax, %xmm0
+; X64-SSE-NEXT:    retq
----------------
I notice X86-SSE1 return by GPR. Should we also return by GPR for X64-SSE?


================
Comment at: llvm/test/CodeGen/X86/avx512-insert-extract.ll:2307
+; SKX-NEXT:    vmovd %ecx, %xmm0
+; SKX-NEXT:    vcvtph2ps %xmm0, %xmm0
+; SKX-NEXT:    vmovss %xmm0, %xmm0, %xmm0 {%k2} {z}
----------------
Is code less efficient than previous code? Why previous code still works without convert half to float?


================
Comment at: llvm/test/CodeGen/X86/avx512-masked_memop-16-8.ll:156
 ; Make sure we scalarize masked loads of f16.
 define <16 x half> @test_mask_load_16xf16(<16 x i1> %mask, <16 x half>* %addr, <16 x half> %val) {
 ; CHECK-LABEL: test_mask_load_16xf16:
----------------
It seems parameter %val is useless.


================
Comment at: llvm/test/CodeGen/X86/callbr-asm-bb-exports.ll:20
 ; CHECK-NEXT: t22: ch,glue = CopyToReg t17, Register:i32 %5, t8
-; CHECK-NEXT: t30: ch,glue = inlineasm_br t22, TargetExternalSymbol:i64'xorl $0, $0; jmp ${1:l}', MDNode:ch<null>, TargetConstant:i64<8>, TargetConstant:i32<2293769>, Register:i32 %5, TargetConstant:i64<13>, TargetBlockAddress:i64<@test, %fail> 0, TargetConstant:i32<12>, Register:i32 $df, TargetConstant:i32<12>, Register:i16 $fpsw, TargetConstant:i32<12>, Register:i32 $eflags, t22:1
+; CHECK-NEXT: t30: ch,glue = inlineasm_br t22, TargetExternalSymbol:i64'xorl $0, $0; jmp ${1:l}', MDNode:ch<null>, TargetConstant:i64<8>, TargetConstant:i32<2359305>, Register:i32 %5, TargetConstant:i64<13>, TargetBlockAddress:i64<@test, %fail> 0, TargetConstant:i32<12>, Register:i32 $df, TargetConstant:i32<12>, Register:i16 $fpsw, TargetConstant:i32<12>, Register:i32 $eflags, t22:1
 
----------------
Why this test is affacted? Is it caused by calling convention change?


================
Comment at: llvm/test/CodeGen/X86/fmf-flags.ll:115
-; X64-NEXT:    movzwl %di, %edi
-; X64-NEXT:    callq __gnu_h2f_ieee at PLT
 ; X64-NEXT:    mulss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
----------------
Does __gnu_h2f_ieee retrun from xmm?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082



More information about the llvm-commits mailing list