[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI
Phoebe Wang via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jun 8 07:15:17 PDT 2022
pengfei added inline comments.
================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:616
+ setOperationAction(ISD::FROUNDEVEN, MVT::f16, Promote);
+ setOperationAction(ISD::FP_ROUND, MVT::f16, Expand);
+ setOperationAction(ISD::FP_EXTEND, MVT::f32, Expand);
----------------
LuoYuanke wrote:
> Just confused how to expand it. Will the expand fail and finally turns to libcall?
Yeah, we can use `LibCall` instead.
================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:763
+ if (isTypeLegal(MVT::f16)) {
+ setOperationAction(ISD::FP_EXTEND, MVT::f80, Custom);
+ setOperationAction(ISD::STRICT_FP_EXTEND, MVT::f80, Custom);
----------------
LuoYuanke wrote:
> Why f16 emulation affect f80 type? Are we checking isTypeLegal(MVT::f80)?
It's in the scope of `if (UseX87)`. And we need to lower `fpext half %0 to x86_fp80`.
================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22448
+ if (SrcVT == MVT::f16)
+ return SDValue();
+
----------------
LuoYuanke wrote:
> Why we don't extent to f32 here?
Return `SDValue()` will extent later. This can save the code.
================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22522
+ if (!isScalarFPTypeInSSEReg(SrcVT) ||
+ (SrcVT == MVT::f16 && !Subtarget.hasFP16()))
return SDValue();
----------------
LuoYuanke wrote:
> Why we don't extent to f32 here? Will it be promoted finally?
Yes.
================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22765
+ DAG.getIntPtrConstant(0, DL));
+ Res = DAG.getNode(X86ISD::STRICT_CVTPS2PH, DL, {MVT::v8i16, MVT::Other},
+ {Chain, Res, DAG.getTargetConstant(4, DL, MVT::i32)});
----------------
LuoYuanke wrote:
> Should MVT::v8i16 be MVT::v8f16?
No. We use `MVT::v8i16` when we enabled F16C instructions.
================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22775
+
+ Res = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i16, Res,
+ DAG.getIntPtrConstant(0, DL));
----------------
LuoYuanke wrote:
> MVT::f16 and delete the bitcast?
I don't think we have pattern to extract `f16` from `v8i16`. Besides, I think keeping the bitcast makes the flow clear.
================
Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:1476
}
-let Predicates = [HasFP16] in {
+let Predicates = [HasBWI] in {
def : Pat<(v32f16 (X86VBroadcastld16 addr:$src)),
----------------
LuoYuanke wrote:
> If target don't have avx512bw feature. There is some other pattern to lower the node or fp16 broadcast node is invalid?
Good catch. Added in X86InstrSSE.td
================
Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:4107
_.ExeDomain>, EVEX_4V, Sched<[SchedWriteFShuffle.XMM]>;
+ let Predicates = [prd] in {
def rrkz : AVX512PI<0x10, MRMSrcReg, (outs _.RC:$dst),
----------------
LuoYuanke wrote:
> Previous prd only apply to "def rr"? Is it a bug for previous code?
No. previous code works well because no mask variants before AVX512 and no f16 before FP16. The latter is not true now.
================
Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:4352
+defm : avx512_move_scalar_lowering<"VMOVSHZ", X86Movsh, fp16imm0, v8f16x_info>;
+defm : avx512_store_scalar_lowering<"VMOVSHZ", avx512vl_f16_info,
+ (v32i1 (bitconvert (and GR32:$mask, (i32 1)))), GR32>;
----------------
LuoYuanke wrote:
> Why previous code don't have predicates?
Because no legal `f16` previously.
================
Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:11657
+let Predicates = [HasBWI], AddedComplexity = -10 in {
+ def : Pat<(f16 (load addr:$src)), (COPY_TO_REGCLASS (VPINSRWZrm (v8i16 (IMPLICIT_DEF)), addr:$src, 0), FR16X)>;
----------------
LuoYuanke wrote:
> Why set AddedComplexity to -10? There no such addtional complexity in previous code. Add comments for it?
We used it before, but very little. We need to make sure select FP16 instructions first if available.
================
Comment at: llvm/lib/Target/X86/X86InstrSSE.td:3970
+let Predicates = [UseSSE2], AddedComplexity = -10 in {
+ def : Pat<(f16 (load addr:$src)), (COPY_TO_REGCLASS (PINSRWrm (v8i16 (IMPLICIT_DEF)), addr:$src, 0), FR16)>;
----------------
LuoYuanke wrote:
> Why AddedComplexity = -10? Add comments for it?
This is to avoid FP16 instructions been overridden.
================
Comment at: llvm/lib/Target/X86/X86InstrSSE.td:3978
+let Predicates = [HasAVX, NoBWI], AddedComplexity = -10 in {
+ def : Pat<(f16 (load addr:$src)), (COPY_TO_REGCLASS (VPINSRWrm (v8i16 (IMPLICIT_DEF)), addr:$src, 0), FR16)>;
+ def : Pat<(i16 (bitconvert f16:$src)), (EXTRACT_SUBREG (VPEXTRWrr (v8i16 (COPY_TO_REGCLASS FR16:$src, VR128)), 0), sub_16bit)>;
----------------
LuoYuanke wrote:
> Miss pattern for store?
It's in line 5214.
================
Comment at: llvm/lib/Target/X86/X86InstrSSE.td:5214
+
+let Predicates = [HasAVX, NoBWI] in
+ def : Pat<(store f16:$src, addr:$dst), (VPEXTRWmr addr:$dst, (v8i16 (COPY_TO_REGCLASS FR16:$src, VR128)), 0)>;
----------------
LuoYuanke wrote:
> Why no AddedComplexity for it?
We don't need it if no BWI.
================
Comment at: llvm/lib/Target/X86/X86RegisterInfo.td:540
+def FR16 : RegisterClass<"X86", [f16], 16, (add FR32)> {let Size = 32;}
+
----------------
LuoYuanke wrote:
> pengfei wrote:
> > skan wrote:
> > > The alignment is not same as the size?
> > No. This is spill size instead of alignment.
> When there is avx512fp16 feature, is the spill size still 32?
Yes, it's more efficient to use movss that insert/extrct. And we also use `FR32X` for AVX512 targets without FP16.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D107082/new/
https://reviews.llvm.org/D107082
More information about the llvm-commits
mailing list