[PATCH] D28540: [NVPTX] Added support for half-precision floating point.

Fri Jan 13 11:56:19 PST 2017

tra added inline comments.

================
Comment at: lib/Target/NVPTX/NVPTXSubtarget.h:104
   bool hasImageHandles() const;
+  bool hasFP16() const { return SmVersion >= 53; }
+  bool allowFP16Math() const;
----------------
jlebar wrote:
> jlebar wrote:
> > tra wrote:
> > > jlebar wrote:
> > > > Do we need this function at all?  It's kind of confusing because pre-sm_53 we had fp16 loads/stores, so you could say we "had" fp16.  But also someone might accidentally call this instead of allowFP16Math().
> > > Renamed to hasFP16Math.
> > Yeah, but still...do we need it?  It's confusing that the predicates in the .td are called "hasFP16Math" but they actually correspond to allowFP16Math, and I'm afraid in general someone will call hasFP16Math when they mean to call allowFP16Math.  It seems that nobody needs this function today, so can we remove it and inline the SM version check into allowFP16Math?
> > 
> > We might also want to change the tablegen predicate to match this function name.
> ^ This is my only remaining nontrivial concern about the patch.  Otherwise, looks great to me.
I'd rather keep both functions here so one can tell 'hardware has this feature' from 'we can use this feature'.

You do have a point about the predicate name in .td file. I've renamed it to useFP16Math to avoid confusion with this function.

================
Comment at: test/CodeGen/NVPTX/f16-instructions.ll:764
+; CHECK:      cvt.f32.f16     [[AF:%f[0-9]+]], [[A]];
+; CHECK:      sin.approx.f32  [[RF:%f[0-9]+]], [[AF]];
+; CHECK:      cvt.rn.f16.f32  [[R:%h[0-9]+]], [[RF]];
----------------
jlebar wrote:
> tra wrote:
> > jlebar wrote:
> > > jlebar wrote:
> > > > How do we know it's correct to lower this as `cvt.to.f16(sin.approx.f32(x))`?  That only works if we're guaranteed that the error of sin.approx.f32 is too small to be noticed in fp16.  But that doesn't seem guaranteed.  All the ISA says about precision is
> > > > 
> > > > > The maximum absolute error is 2^-20.9 in quadrant 00.
> > > > 
> > > > This error is too small to be represented in an fp16, which would normally mean we're good.  But because it qualifies with "in quadrant 00", that suggests that all bets are off if we're not in...whatever is quadrant 00.  (I presume it's the first quadrant?)
> > > > 
> > > > Same for cosine.
> > > Actually, I take it back about 2^-20.9 being too small to fit in an fp16.  I forgot about denormals.  See https://en.wikipedia.org/wiki/Half-precision_floating-point_format#Precision_limitations_on_decimal_values_in_.5B0.2C_1.5D
> > I don't have a good answer. In general, given that fp16 has fewer bits of mantissa, whatever error sin.approx.f32 produces is likely to be lost because the low bits will be lost during conversion to fp16. We'll know more once I have FP test suite running on GPU.
> I really think, in the absence of evidence that it is safe, we need to be conservative and disable this (except for fastmath, if you like).  We should not assume that sin.approx.f32 is sufficiently accurate outside of the first quadrant (and I am not even sure it's sufficiently accurate within the first quadrant).
Done in D28619/r291936

https://reviews.llvm.org/D28540