[PATCH] D28540: [NVPTX] Added support for half-precision floating point.

Wed Jan 11 13:52:37 PST 2017

tra added inline comments.

================
Comment at: lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp:1007
+    FromType = ScalarVT.SimpleTy == MVT::f16 ? NVPTX::PTXLdStInstCode::Untyped
+                                             : NVPTX::PTXLdStInstCode::Float;
   else
----------------
jlebar wrote:
> ibid
Removed this change as we're not handling f16 vectors here yet.

================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:149
+  setOperationAction(ISD::SELECT_CC, MVT::f16,
+                     STI.allowFP16Math() ? Expand : Promote);
   setOperationAction(ISD::SELECT_CC, MVT::f32, Expand);
----------------
jlebar wrote:
> The comment above doesn't seem to apply to these two new lines.
Moved `SETCC` out. The comment still applies to `{SELECT,BR}_CC` below.

================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:320
   setOperationAction(ISD::FROUND, MVT::f64, Legal);
+  setOperationAction(ISD::FTRUNC, MVT::f16, Legal);
   setOperationAction(ISD::FTRUNC, MVT::f32, Legal);
----------------
jlebar wrote:
> This is all legal even on sm_20 with ptxas from CUDA 7?  (I tried looking this up in the docs and failed, sorry.)
Yes. Conversion instructions support f16 on all sm_20+ GPUs according to CUDA 7.0+  docs. 

================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:2120
+      DAG.getTruncStore(Tmp1, dl, Tmp3, Tmp2, ST->getPointerInfo(), MVT::i16,
+                        ST->getAlignment(), ST->getMemOperand()->getFlags());
+  return Result;
----------------
jlebar wrote:
> Can we explain what we're doing differently here?
I don't think I need it any more -- llvm does all f16 loads/stores using .b16 ops and registers now, so I don't have to explicitly store it as MVT::i16.
I've removed the function.

================
Comment at: lib/Target/NVPTX/NVPTXSubtarget.h:104
   bool hasImageHandles() const;
+  bool hasFP16() const { return SmVersion >= 53; }
+  bool allowFP16Math() const;
----------------
jlebar wrote:
> Do we need this function at all?  It's kind of confusing because pre-sm_53 we had fp16 loads/stores, so you could say we "had" fp16.  But also someone might accidentally call this instead of allowFP16Math().
Renamed to hasFP16Math.

================
Comment at: test/CodeGen/NVPTX/f16-instructions.ll:764
+; CHECK:      cvt.f32.f16     [[AF:%f[0-9]+]], [[A]];
+; CHECK:      sin.approx.f32  [[RF:%f[0-9]+]], [[AF]];
+; CHECK:      cvt.rn.f16.f32  [[R:%h[0-9]+]], [[RF]];
----------------
jlebar wrote:
> jlebar wrote:
> > How do we know it's correct to lower this as `cvt.to.f16(sin.approx.f32(x))`?  That only works if we're guaranteed that the error of sin.approx.f32 is too small to be noticed in fp16.  But that doesn't seem guaranteed.  All the ISA says about precision is
> > 
> > > The maximum absolute error is 2^-20.9 in quadrant 00.
> > 
> > This error is too small to be represented in an fp16, which would normally mean we're good.  But because it qualifies with "in quadrant 00", that suggests that all bets are off if we're not in...whatever is quadrant 00.  (I presume it's the first quadrant?)
> > 
> > Same for cosine.
> Actually, I take it back about 2^-20.9 being too small to fit in an fp16.  I forgot about denormals.  See https://en.wikipedia.org/wiki/Half-precision_floating-point_format#Precision_limitations_on_decimal_values_in_.5B0.2C_1.5D
I don't have a good answer. In general, given that fp16 has fewer bits of mantissa, whatever error sin.approx.f32 produces is likely to be lost because the low bits will be lost during conversion to fp16. We'll know more once I have FP test suite running on GPU.

https://reviews.llvm.org/D28540