[PATCH] D28540: [NVPTX] Added support for half-precision floating point.

Thu Jan 12 21:21:48 PST 2017

jlebar added inline comments.

================
Comment at: lib/Target/NVPTX/NVPTXAsmPrinter.cpp:370
+      // bits in size.  fp16 is returned as an integer, so this size
+      // adjustment applies to floats too.
+      if (size < 32)
----------------
Everywhere that we talk about f16s being stored/returned as/loaded as "integers" or "untyped integers", I think we should just say "a b16" or "an untyped value".  An "untyped integer" might mean something like llvm's i16, which is definitely an integer, but is "untyped" inasmuch as it might represent a signed or an unsigned int.  That's conceptually different from .b16, which is just a bag of bits.

I've noted places that would need to change with the comment "b16", but you don't have to change all of them to read "b16", so long as we don't say "untyped integer".  :)

================
Comment at: lib/Target/NVPTX/NVPTXAsmPrinter.cpp:1586
+        // PTX ABI requires all scalars to be at least 32 bits in size.
+        // fp16 is returned as an integer and must follow this rule, too.
+        sz = 32;
----------------
b16

================
Comment at: lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp:754
   else if (ScalarVT.isFloatingPoint())
-    fromType = NVPTX::PTXLdStInstCode::Float;
+    // f16 uses untyped integer load, as there's no ld.f16 instruction.
+    fromType = ScalarVT.SimpleTy == MVT::f16 ? NVPTX::PTXLdStInstCode::Untyped
----------------
b16

================
Comment at: lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp:754
   else if (ScalarVT.isFloatingPoint())
-    fromType = NVPTX::PTXLdStInstCode::Float;
+    // f16 uses untyped integer load, as there's no ld.f16 instruction.
+    fromType = ScalarVT.SimpleTy == MVT::f16 ? NVPTX::PTXLdStInstCode::Untyped
----------------
(Unable to delete this comment due to phab bug.  This comment intentionally left empty.)

================
Comment at: lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp:2207
   if (ScalarVT.isFloatingPoint())
-    toType = NVPTX::PTXLdStInstCode::Float;
+    // f16 uses untyped integer store (st.b16), as there's no st.f16
+    // instruction.
----------------
b16

================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:150
+  setOperationAction(ISD::SELECT_CC, MVT::f16,
+                     STI.allowFP16Math() ? Expand : Promote);
   setOperationAction(ISD::SELECT_CC, MVT::f32, Expand);
----------------
(I can't delete this comment; intentionally left empty.)

================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:996
       }
+      // PTX ISA requires all scalars to be at least 32 bits in size.
+      // fp16 is returned as an integer and must follow this rule, too.
----------------
All scalar return values, function parameters, etc, but not all scalars in general.  Can we clarify?

================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:997
+      // PTX ISA requires all scalars to be at least 32 bits in size.
+      // fp16 is returned as an integer and must follow this rule, too.
+      if (size < 32)
----------------
b16

================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:1056
+      } else if (Ty->isHalfTy())
+        // PTX ISA requires all sclars to be at least 32 bits in size.
+        // fp16 is returned as an integer and must follow this rule, too.
----------------
Same here about "all scalars"

================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:1057
+        // PTX ISA requires all sclars to be at least 32 bits in size.
+        // fp16 is returned as an integer and must follow this rule, too.
+        sz = 32;
----------------
b16

================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:1370
+      } else if (VT.isFloatingPoint() && sz < 32)
+        // PTX ISA requires all sclars to be at least 32 bits in size.
+        // fp16 is returned as an integer and must follow this rule, too.
----------------
Same about all scalars, and same b16 comment.

================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:158
+  setOperationAction(ISD::BR_CC, MVT::f16,
+                     STI.allowFP16Math() ? Expand : Promote);
   setOperationAction(ISD::BR_CC, MVT::f32, Expand);
----------------
jlebar wrote:
> ibid
(Yet another comment I cannot delete, please ignore me.)

================
Comment at: lib/Target/NVPTX/NVPTXInstrInfo.td:299
+               [(set Float16Regs:$dst, (OpNode Float16Regs:$a, Float16Regs:$b))]>,
+               Requires<[hasFP16Math, allowFMA, doF32FTZ]>;
+   def f16rr :
----------------
Heh, we should do a big rename of "doF32FTZ" in a separate patch.

================
Comment at: lib/Target/NVPTX/NVPTXSubtarget.h:104
   bool hasImageHandles() const;
+  bool hasFP16() const { return SmVersion >= 53; }
+  bool allowFP16Math() const;
----------------
tra wrote:
> jlebar wrote:
> > Do we need this function at all?  It's kind of confusing because pre-sm_53 we had fp16 loads/stores, so you could say we "had" fp16.  But also someone might accidentally call this instead of allowFP16Math().
> Renamed to hasFP16Math.
Yeah, but still...do we need it?  It's confusing that the predicates in the .td are called "hasFP16Math" but they actually correspond to allowFP16Math, and I'm afraid in general someone will call hasFP16Math when they mean to call allowFP16Math.  It seems that nobody needs this function today, so can we remove it and inline the SM version check into allowFP16Math?

We might also want to change the tablegen predicate to match this function name.

================
Comment at: test/CodeGen/NVPTX/f16-instructions.ll:764
+; CHECK:      cvt.f32.f16     [[AF:%f[0-9]+]], [[A]];
+; CHECK:      sin.approx.f32  [[RF:%f[0-9]+]], [[AF]];
+; CHECK:      cvt.rn.f16.f32  [[R:%h[0-9]+]], [[RF]];
----------------
tra wrote:
> jlebar wrote:
> > jlebar wrote:
> > > How do we know it's correct to lower this as `cvt.to.f16(sin.approx.f32(x))`?  That only works if we're guaranteed that the error of sin.approx.f32 is too small to be noticed in fp16.  But that doesn't seem guaranteed.  All the ISA says about precision is
> > > 
> > > > The maximum absolute error is 2^-20.9 in quadrant 00.
> > > 
> > > This error is too small to be represented in an fp16, which would normally mean we're good.  But because it qualifies with "in quadrant 00", that suggests that all bets are off if we're not in...whatever is quadrant 00.  (I presume it's the first quadrant?)
> > > 
> > > Same for cosine.
> > Actually, I take it back about 2^-20.9 being too small to fit in an fp16.  I forgot about denormals.  See https://en.wikipedia.org/wiki/Half-precision_floating-point_format#Precision_limitations_on_decimal_values_in_.5B0.2C_1.5D
> I don't have a good answer. In general, given that fp16 has fewer bits of mantissa, whatever error sin.approx.f32 produces is likely to be lost because the low bits will be lost during conversion to fp16. We'll know more once I have FP test suite running on GPU.
I really think, in the absence of evidence that it is safe, we need to be conservative and disable this (except for fastmath, if you like).  We should not assume that sin.approx.f32 is sufficiently accurate outside of the first quadrant (and I am not even sure it's sufficiently accurate within the first quadrant).

https://reviews.llvm.org/D28540