[PATCH] D30057: [NVPTX] Added support for v2f16 operations.

Thu Feb 16 18:37:08 PST 2017

jlebar added a comment.

I really like this patch.

================
Comment at: lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp:869
+      // using two f16 elements packed into an i32.
+      fromTypeWidth *= 2;
+      if (NumElts == 2)
----------------
I'm confused how this works without additional changes elsewhere.  We increase fromTypeWidth, but this doesn't affect the selection of Opcode below.  So how do we know that this is a packed-i32 load rather than a regular f16 load?

================
Comment at: lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp:2410
+    }
+  }
+
----------------
Should this be unified with the similar code above?

================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:176
     uint64_t Off = TempOffsets[i];
-    if (VT.isVector())
-      for (unsigned j = 0, je = VT.getVectorNumElements(); j != je; ++j) {
-        ValueVTs.push_back(VT.getVectorElementType());
+    // Split vectors into individual elements, except for v2f16 which
+    // we will pass as a single scalar.
----------------
, which

================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:181
+      EVT EltVT = VT.getVectorElementType();
+      // Vector with even number of f16 elements will be passed to us
+      // as an array of v2f16 elements. We must match this so we stay
----------------
Vectors with an even number

================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:188
+      }
+      for (unsigned j = 0, je = NumElts; j != je; ++j) {
+        ValueVTs.push_back(EltVT);
----------------
j != NumElts?

================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:546
+  // These map to corresponding instructions for f32/f64. f16 must be
+  // promoted to f32. v2f16 is expandef to f16 which is then promoted to f32.
+  for (const auto &Op : {ISD::FDIV, ISD::FREM, ISD::FSQRT, ISD::FSIN, ISD::FCOS,
----------------
expandef

================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:546
+  // These map to corresponding instructions for f32/f64. f16 must be
+  // promoted to f32. v2f16 is expandef to f16 which is then promoted to f32.
+  for (const auto &Op : {ISD::FDIV, ISD::FREM, ISD::FSQRT, ISD::FSIN, ISD::FCOS,
----------------
jlebar wrote:
> expandef
, which

================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:2332
+            // vector of v2f16 elements, so we must load using i32
+            // here and then bitcast it back.
+            // TODO: consider switching to getMemIntrinsicNode() which can
----------------
getLoad needs a vector type, but it can't handle vectors which contain v2f16 elements.  So we must load using i32 here and then bitcast back.

================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:2333
+            // here and then bitcast it back.
+            // TODO: consider switching to getMemIntrinsicNode() which can
+            // deal with v2f16 better.
----------------
, which

================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:2356
               Elt = DAG.getNode(ISD::TRUNCATE, dl, MVT::i1, Elt);
+            // v2f16 was loaded as i32 and we must bitcast it back.
+            else if (EltVT == MVT::v2f16)
----------------
", and", or even better, write as two sentences.

================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:4225
+  SDLoc DL(N);
+  // setp.f16x2 returns two scalar predicates which we need to convert
+  // back to v2i1. Returned result will be scalarized by legalizer,
----------------
, which  :)

================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:4227
+  // back to v2i1. Returned result will be scalarized by legalizer,
+  // but comparison will remain a single vector instruction.
+  SDValue CCNode = DCI.DAG.getNode(NVPTXISD::SETP_F16X2, DL,
----------------
The returned result will be scalarized by the legalizer, but the comparison will remain a single vector instruction.

================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:4233
+                                   CCNode.getValue(0), CCNode.getValue(1));
+  return Result;
+}
----------------
Nit, return DCI.DAG.getNode(...)?

================
Comment at: lib/Target/NVPTX/NVPTXInstrInfo.td:2856
+    return N->getSExtValue() == 1;
+}]>;
+
----------------
Could we give these slightly louder names, maybe ImmediateZero and ImmediateOne or immZero/immOne?  Otherwise it's not really clear that "i1imm" and "imm0" are completely different things -- they look too similar.

================
Comment at: lib/Target/NVPTX/NVPTXInstrInfo.td:2863
+  def F16x2toF16_0 : NVPTXInst<(outs Float16Regs:$dst),
+                             (ins Float16x2Regs:$src, i1imm:$c),
+                             "{{ .reg .b16 \t%tmp_hi;\n\t"
----------------
Nit, indentation, also on the one below.

================
Comment at: lib/Target/NVPTX/NVPTXInstrInfo.td:2873
+                             [(set Float16Regs:$dst,
+                               (extractelt (v2f16 Float16x2Regs:$src), imm1:$c))]>;
+
----------------
What if the vector element index is not an immediate?  Do we just fail to select?  I'd think we should be able to handle that.

================
Comment at: lib/Target/NVPTX/NVPTXInstrInfo.td:2882
+
+  // Directly initializing underlying b32 register is one less SASS
+  // instruction than than vector-packing move.
----------------
the b32 register

https://reviews.llvm.org/D30057