[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

Wed Jun 8 02:01:24 PDT 2022

LuoYuanke added inline comments.

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:616
+    setOperationAction(ISD::FROUNDEVEN, MVT::f16, Promote);
+    setOperationAction(ISD::FP_ROUND, MVT::f16, Expand);
+    setOperationAction(ISD::FP_EXTEND, MVT::f32, Expand);
----------------
Just confused how to expand it. Will the expand fail and finally turns to libcall?

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:763
+    if (isTypeLegal(MVT::f16)) {
+      setOperationAction(ISD::FP_EXTEND, MVT::f80, Custom);
+      setOperationAction(ISD::STRICT_FP_EXTEND, MVT::f80, Custom);
----------------
Why f16 emulation affect f80 type? Are we checking isTypeLegal(MVT::f80)?

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22100
   SDValue Res;
+  if (SrcVT == MVT::f16 && !Subtarget.hasFP16()) {
+    if (IsStrict)
----------------
Not sure if it is better to wrapper it into a readable function (e.g., isSoftF16).

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22448
+  if (SrcVT == MVT::f16)
+    return SDValue();
+
----------------
Why we don't extent to f32 here?

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22522
+  if (!isScalarFPTypeInSSEReg(SrcVT) ||
+      (SrcVT == MVT::f16 && !Subtarget.hasFP16()))
     return SDValue();
----------------
Why we don't extent to f32 here? Will it be promoted finally?

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22765
+                        DAG.getIntPtrConstant(0, DL));
+      Res = DAG.getNode(X86ISD::STRICT_CVTPS2PH, DL, {MVT::v8i16, MVT::Other},
+                        {Chain, Res, DAG.getTargetConstant(4, DL, MVT::i32)});
----------------
Should MVT::v8i16 be MVT::v8f16?

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22766
+      Res = DAG.getNode(X86ISD::STRICT_CVTPS2PH, DL, {MVT::v8i16, MVT::Other},
+                        {Chain, Res, DAG.getTargetConstant(4, DL, MVT::i32)});
+      Chain = Res.getValue(1);
----------------
Is it rounding control? Can we use a macro or add comments for what is the rounding control?

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22775
+
+    Res = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i16, Res,
+                      DAG.getIntPtrConstant(0, DL));
----------------
MVT::f16 and delete the bitcast?

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:44211
       VT != MVT::f80 && VT != MVT::f128 &&
+      !(VT.getScalarType() == MVT::f16 && !Subtarget.hasFP16()) &&
       (TLI.isTypeLegal(VT) || VT == MVT::v2f32) &&
----------------
Not sure if it is better to wrapper it into a readable function (e.g., isSoftF16).

================
Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:1476
 }
-let Predicates = [HasFP16] in {
+let Predicates = [HasBWI] in {
   def : Pat<(v32f16 (X86VBroadcastld16 addr:$src)),
----------------
If target don't have avx512bw feature. There is some other pattern to lower the node or fp16 broadcast node is invalid?

================
Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:4107
              _.ExeDomain>, EVEX_4V, Sched<[SchedWriteFShuffle.XMM]>;
+  let Predicates = [prd] in {
   def rrkz : AVX512PI<0x10, MRMSrcReg, (outs _.RC:$dst),
----------------
Previous prd only apply to "def rr"? Is it a bug for previous code?

================
Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:4352
+defm : avx512_move_scalar_lowering<"VMOVSHZ", X86Movsh, fp16imm0, v8f16x_info>;
+defm : avx512_store_scalar_lowering<"VMOVSHZ", avx512vl_f16_info,
+                   (v32i1 (bitconvert (and GR32:$mask, (i32 1)))), GR32>;
----------------
Why previous code don't have predicates?

================
Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:11657

+let Predicates = [HasBWI], AddedComplexity = -10 in {
+  def : Pat<(f16 (load addr:$src)), (COPY_TO_REGCLASS (VPINSRWZrm (v8i16 (IMPLICIT_DEF)), addr:$src, 0), FR16X)>;
----------------
Why set AddedComplexity to -10? There no such addtional complexity in previous code. Add comments for it? 

================
Comment at: llvm/lib/Target/X86/X86InstrSSE.td:3970

+let Predicates = [UseSSE2], AddedComplexity = -10 in {
+  def : Pat<(f16 (load addr:$src)), (COPY_TO_REGCLASS (PINSRWrm (v8i16 (IMPLICIT_DEF)), addr:$src, 0), FR16)>;
----------------
Why  AddedComplexity = -10? Add comments for it?

================
Comment at: llvm/lib/Target/X86/X86InstrSSE.td:3978
+let Predicates = [HasAVX, NoBWI], AddedComplexity = -10 in {
+  def : Pat<(f16 (load addr:$src)), (COPY_TO_REGCLASS (VPINSRWrm (v8i16 (IMPLICIT_DEF)), addr:$src, 0), FR16)>;
+  def : Pat<(i16 (bitconvert f16:$src)), (EXTRACT_SUBREG (VPEXTRWrr (v8i16 (COPY_TO_REGCLASS FR16:$src, VR128)), 0), sub_16bit)>;
----------------
Miss pattern for store?

================
Comment at: llvm/lib/Target/X86/X86InstrSSE.td:5214
+
+let Predicates = [HasAVX, NoBWI] in
+  def : Pat<(store f16:$src, addr:$dst), (VPEXTRWmr addr:$dst, (v8i16 (COPY_TO_REGCLASS FR16:$src, VR128)), 0)>;
----------------
Why no AddedComplexity for it?

================
Comment at: llvm/lib/Target/X86/X86RegisterInfo.td:540

+def FR16 : RegisterClass<"X86", [f16], 16, (add FR32)> {let Size = 32;}
+
----------------
pengfei wrote:
> skan wrote:
> > The alignment is not same as the size?
> No. This is spill size instead of alignment.
When there is avx512fp16 feature, is the spill size still 32?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082