[PATCH] D30057: [NVPTX] Added support for v2f16 operations.
Justin Lebar via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Feb 16 18:37:08 PST 2017
jlebar added a comment.
I really like this patch.
================
Comment at: lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp:869
+ // using two f16 elements packed into an i32.
+ fromTypeWidth *= 2;
+ if (NumElts == 2)
----------------
I'm confused how this works without additional changes elsewhere. We increase fromTypeWidth, but this doesn't affect the selection of Opcode below. So how do we know that this is a packed-i32 load rather than a regular f16 load?
================
Comment at: lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp:2410
+ }
+ }
+
----------------
Should this be unified with the similar code above?
================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:176
uint64_t Off = TempOffsets[i];
- if (VT.isVector())
- for (unsigned j = 0, je = VT.getVectorNumElements(); j != je; ++j) {
- ValueVTs.push_back(VT.getVectorElementType());
+ // Split vectors into individual elements, except for v2f16 which
+ // we will pass as a single scalar.
----------------
, which
================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:181
+ EVT EltVT = VT.getVectorElementType();
+ // Vector with even number of f16 elements will be passed to us
+ // as an array of v2f16 elements. We must match this so we stay
----------------
Vectors with an even number
================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:188
+ }
+ for (unsigned j = 0, je = NumElts; j != je; ++j) {
+ ValueVTs.push_back(EltVT);
----------------
j != NumElts?
================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:546
+ // These map to corresponding instructions for f32/f64. f16 must be
+ // promoted to f32. v2f16 is expandef to f16 which is then promoted to f32.
+ for (const auto &Op : {ISD::FDIV, ISD::FREM, ISD::FSQRT, ISD::FSIN, ISD::FCOS,
----------------
expandef
================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:546
+ // These map to corresponding instructions for f32/f64. f16 must be
+ // promoted to f32. v2f16 is expandef to f16 which is then promoted to f32.
+ for (const auto &Op : {ISD::FDIV, ISD::FREM, ISD::FSQRT, ISD::FSIN, ISD::FCOS,
----------------
jlebar wrote:
> expandef
, which
================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:2332
+ // vector of v2f16 elements, so we must load using i32
+ // here and then bitcast it back.
+ // TODO: consider switching to getMemIntrinsicNode() which can
----------------
getLoad needs a vector type, but it can't handle vectors which contain v2f16 elements. So we must load using i32 here and then bitcast back.
================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:2333
+ // here and then bitcast it back.
+ // TODO: consider switching to getMemIntrinsicNode() which can
+ // deal with v2f16 better.
----------------
, which
================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:2356
Elt = DAG.getNode(ISD::TRUNCATE, dl, MVT::i1, Elt);
+ // v2f16 was loaded as i32 and we must bitcast it back.
+ else if (EltVT == MVT::v2f16)
----------------
", and", or even better, write as two sentences.
================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:4225
+ SDLoc DL(N);
+ // setp.f16x2 returns two scalar predicates which we need to convert
+ // back to v2i1. Returned result will be scalarized by legalizer,
----------------
, which :)
================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:4227
+ // back to v2i1. Returned result will be scalarized by legalizer,
+ // but comparison will remain a single vector instruction.
+ SDValue CCNode = DCI.DAG.getNode(NVPTXISD::SETP_F16X2, DL,
----------------
The returned result will be scalarized by the legalizer, but the comparison will remain a single vector instruction.
================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:4233
+ CCNode.getValue(0), CCNode.getValue(1));
+ return Result;
+}
----------------
Nit, return DCI.DAG.getNode(...)?
================
Comment at: lib/Target/NVPTX/NVPTXInstrInfo.td:2856
+ return N->getSExtValue() == 1;
+}]>;
+
----------------
Could we give these slightly louder names, maybe ImmediateZero and ImmediateOne or immZero/immOne? Otherwise it's not really clear that "i1imm" and "imm0" are completely different things -- they look too similar.
================
Comment at: lib/Target/NVPTX/NVPTXInstrInfo.td:2863
+ def F16x2toF16_0 : NVPTXInst<(outs Float16Regs:$dst),
+ (ins Float16x2Regs:$src, i1imm:$c),
+ "{{ .reg .b16 \t%tmp_hi;\n\t"
----------------
Nit, indentation, also on the one below.
================
Comment at: lib/Target/NVPTX/NVPTXInstrInfo.td:2873
+ [(set Float16Regs:$dst,
+ (extractelt (v2f16 Float16x2Regs:$src), imm1:$c))]>;
+
----------------
What if the vector element index is not an immediate? Do we just fail to select? I'd think we should be able to handle that.
================
Comment at: lib/Target/NVPTX/NVPTXInstrInfo.td:2882
+
+ // Directly initializing underlying b32 register is one less SASS
+ // instruction than than vector-packing move.
----------------
the b32 register
https://reviews.llvm.org/D30057
More information about the llvm-commits
mailing list