[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

Wed Jun 8 07:15:17 PDT 2022

pengfei added inline comments.

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:616
+    setOperationAction(ISD::FROUNDEVEN, MVT::f16, Promote);
+    setOperationAction(ISD::FP_ROUND, MVT::f16, Expand);
+    setOperationAction(ISD::FP_EXTEND, MVT::f32, Expand);
----------------
LuoYuanke wrote:
> Just confused how to expand it. Will the expand fail and finally turns to libcall?
Yeah, we can use `LibCall` instead.

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:763
+    if (isTypeLegal(MVT::f16)) {
+      setOperationAction(ISD::FP_EXTEND, MVT::f80, Custom);
+      setOperationAction(ISD::STRICT_FP_EXTEND, MVT::f80, Custom);
----------------
LuoYuanke wrote:
> Why f16 emulation affect f80 type? Are we checking isTypeLegal(MVT::f80)?
It's in the scope of `if (UseX87)`. And we need to lower `fpext half %0 to x86_fp80`.

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22448
+  if (SrcVT == MVT::f16)
+    return SDValue();
+
----------------
LuoYuanke wrote:
> Why we don't extent to f32 here?
Return `SDValue()` will extent later. This can save the code.

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22522
+  if (!isScalarFPTypeInSSEReg(SrcVT) ||
+      (SrcVT == MVT::f16 && !Subtarget.hasFP16()))
     return SDValue();
----------------
LuoYuanke wrote:
> Why we don't extent to f32 here? Will it be promoted finally?
Yes.

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22765
+                        DAG.getIntPtrConstant(0, DL));
+      Res = DAG.getNode(X86ISD::STRICT_CVTPS2PH, DL, {MVT::v8i16, MVT::Other},
+                        {Chain, Res, DAG.getTargetConstant(4, DL, MVT::i32)});
----------------
LuoYuanke wrote:
> Should MVT::v8i16 be MVT::v8f16?
No. We use `MVT::v8i16` when we enabled F16C instructions.

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22775
+
+    Res = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i16, Res,
+                      DAG.getIntPtrConstant(0, DL));
----------------
LuoYuanke wrote:
> MVT::f16 and delete the bitcast?
I don't think we have pattern to extract `f16` from `v8i16`. Besides, I think keeping the bitcast makes the flow clear.

================
Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:1476
 }
-let Predicates = [HasFP16] in {
+let Predicates = [HasBWI] in {
   def : Pat<(v32f16 (X86VBroadcastld16 addr:$src)),
----------------
LuoYuanke wrote:
> If target don't have avx512bw feature. There is some other pattern to lower the node or fp16 broadcast node is invalid?
Good catch. Added in X86InstrSSE.td

================
Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:4107
              _.ExeDomain>, EVEX_4V, Sched<[SchedWriteFShuffle.XMM]>;
+  let Predicates = [prd] in {
   def rrkz : AVX512PI<0x10, MRMSrcReg, (outs _.RC:$dst),
----------------
LuoYuanke wrote:
> Previous prd only apply to "def rr"? Is it a bug for previous code?
No. previous code works well because no mask variants before AVX512 and no f16 before FP16. The latter is not true now.

================
Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:4352
+defm : avx512_move_scalar_lowering<"VMOVSHZ", X86Movsh, fp16imm0, v8f16x_info>;
+defm : avx512_store_scalar_lowering<"VMOVSHZ", avx512vl_f16_info,
+                   (v32i1 (bitconvert (and GR32:$mask, (i32 1)))), GR32>;
----------------
LuoYuanke wrote:
> Why previous code don't have predicates?
Because no legal `f16` previously.

================
Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:11657

+let Predicates = [HasBWI], AddedComplexity = -10 in {
+  def : Pat<(f16 (load addr:$src)), (COPY_TO_REGCLASS (VPINSRWZrm (v8i16 (IMPLICIT_DEF)), addr:$src, 0), FR16X)>;
----------------
LuoYuanke wrote:
> Why set AddedComplexity to -10? There no such addtional complexity in previous code. Add comments for it? 
We used it before, but very little. We need to make sure select FP16 instructions first if available.

================
Comment at: llvm/lib/Target/X86/X86InstrSSE.td:3970

+let Predicates = [UseSSE2], AddedComplexity = -10 in {
+  def : Pat<(f16 (load addr:$src)), (COPY_TO_REGCLASS (PINSRWrm (v8i16 (IMPLICIT_DEF)), addr:$src, 0), FR16)>;
----------------
LuoYuanke wrote:
> Why  AddedComplexity = -10? Add comments for it?
This is to avoid FP16 instructions been overridden.

================
Comment at: llvm/lib/Target/X86/X86InstrSSE.td:3978
+let Predicates = [HasAVX, NoBWI], AddedComplexity = -10 in {
+  def : Pat<(f16 (load addr:$src)), (COPY_TO_REGCLASS (VPINSRWrm (v8i16 (IMPLICIT_DEF)), addr:$src, 0), FR16)>;
+  def : Pat<(i16 (bitconvert f16:$src)), (EXTRACT_SUBREG (VPEXTRWrr (v8i16 (COPY_TO_REGCLASS FR16:$src, VR128)), 0), sub_16bit)>;
----------------
LuoYuanke wrote:
> Miss pattern for store?
It's in line 5214.

================
Comment at: llvm/lib/Target/X86/X86InstrSSE.td:5214
+
+let Predicates = [HasAVX, NoBWI] in
+  def : Pat<(store f16:$src, addr:$dst), (VPEXTRWmr addr:$dst, (v8i16 (COPY_TO_REGCLASS FR16:$src, VR128)), 0)>;
----------------
LuoYuanke wrote:
> Why no AddedComplexity for it?
We don't need it if no BWI.

================
Comment at: llvm/lib/Target/X86/X86RegisterInfo.td:540

+def FR16 : RegisterClass<"X86", [f16], 16, (add FR32)> {let Size = 32;}
+
----------------
LuoYuanke wrote:
> pengfei wrote:
> > skan wrote:
> > > The alignment is not same as the size?
> > No. This is spill size instead of alignment.
> When there is avx512fp16 feature, is the spill size still 32?
Yes, it's more efficient to use movss that insert/extrct. And we also use `FR32X` for AVX512 targets without FP16.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082