[PATCH] D28540: [NVPTX] Added support for half-precision floating point.

Wed Jan 11 11:16:32 PST 2017

jlebar added a comment.

Well this is heroic.  Aside from my question about sin.approx, I just have nits, mostly about comments.

Do you think we should add some tests to the test-suite so we can cover this e2e?

================
Comment at: lib/Target/NVPTX/NVPTXAsmPrinter.cpp:1352
+  case Type::HalfTyID:
+    return "b16";
   case Type::FloatTyID:
----------------
Can we add a comment why this is not f16?

================
Comment at: lib/Target/NVPTX/NVPTXAsmPrinter.cpp:1581
+      else if (Ty->isHalfTy())
+        sz = 32;
       else
----------------
Can we comment why this is not sz = 16?

================
Comment at: lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp:542

+// There's no way to specify FP16 constants in .f16 ops, so we have to
+// load them into an .f16 register first.
----------------
s/constants/immediates/

================
Comment at: lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp:754
   else if (ScalarVT.isFloatingPoint())
-    fromType = NVPTX::PTXLdStInstCode::Float;
+    fromType = ScalarVT.SimpleTy == MVT::f16 ? NVPTX::PTXLdStInstCode::Untyped
+                                             : NVPTX::PTXLdStInstCode::Float;
----------------
Can we add a comment here?

================
Comment at: lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp:1007
+    FromType = ScalarVT.SimpleTy == MVT::f16 ? NVPTX::PTXLdStInstCode::Untyped
+                                             : NVPTX::PTXLdStInstCode::Float;
   else
----------------
ibid

================
Comment at: lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp:2208
+    toType = ScalarVT.SimpleTy == MVT::f16 ? NVPTX::PTXLdStInstCode::Untyped
+                                           : NVPTX::PTXLdStInstCode::Float;
   else
----------------
ibid

================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:149
+  setOperationAction(ISD::SELECT_CC, MVT::f16,
+                     STI.allowFP16Math() ? Expand : Promote);
   setOperationAction(ISD::SELECT_CC, MVT::f32, Expand);
----------------
The comment above doesn't seem to apply to these two new lines.

================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:158
+  setOperationAction(ISD::BR_CC, MVT::f16,
+                     STI.allowFP16Math() ? Expand : Promote);
   setOperationAction(ISD::BR_CC, MVT::f32, Expand);
----------------
ibid

================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:297
+    // token amount of hardware and are likely to run faster by using
+    // fp32 units instead.
+    setOperationAction(ISD::FADD, MVT::f16, Promote);
----------------
Suggest being a tad more explicit:

> Promote fp16 arithmetic if fp16 hardware isn't available or the user passed --flag-name.  The flag is useful because, although sm_53+ GPUs ...

================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:320
   setOperationAction(ISD::FROUND, MVT::f64, Legal);
+  setOperationAction(ISD::FTRUNC, MVT::f16, Legal);
   setOperationAction(ISD::FTRUNC, MVT::f32, Legal);
----------------
This is all legal even on sm_20 with ptxas from CUDA 7?  (I tried looking this up in the docs and failed, sorry.)

================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:328

+  // 'Expand' implements FCOPYSIGN w/o need for the library.
+  setOperationAction(ISD::FCOPYSIGN, MVT::f16, Expand);
----------------
Which library -- libm?  Maybe "without calling an external library" or perhaps "without calling libm" (although maybe someone will come here and tell me it's not technically libm, it's libgcc or somesuch).

================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:2120
+      DAG.getTruncStore(Tmp1, dl, Tmp3, Tmp2, ST->getPointerInfo(), MVT::i16,
+                        ST->getAlignment(), ST->getMemOperand()->getFlags());
+  return Result;
----------------
Can we explain what we're doing differently here?

================
Comment at: lib/Target/NVPTX/NVPTXInstrInfo.td:248
 // Template for instructions which take three fp64 or fp32 args.  The
 // instructions are named "<OpcStr>.f<Width>" (e.g. "add.f64").
 //
----------------
Update comment.

================
Comment at: lib/Target/NVPTX/NVPTXInstrInfo.td:782
+// Loads FP16 constant into a register.
+//   ptxas does not have hex representation for fp16, so we can't use fp16
+//   immediate values in .f16 instructions and have to load the
----------------
Please just make a separate paragraph, instead of indenting like this.

================
Comment at: lib/Target/NVPTX/NVPTXInstrInfo.td:783
+//   ptxas does not have hex representation for fp16, so we can't use fp16
+//   immediate values in .f16 instructions and have to load the
+//   constant into a register using mov.b16.
----------------
Run-on sentence.  Suggest

> instructions.  Instead, we have to

================
Comment at: lib/Target/NVPTX/NVPTXRegisterInfo.cpp:80
+  if (RC == &NVPTX::Float16RegsRegClass)
+    return "%h";
   if (RC == &NVPTX::Float64RegsRegClass) {
----------------
Nit, can we write this function either using if-no-else or if-else, instead of mixing the two?  (I realize that this patch is not the original sin...)

================
Comment at: lib/Target/NVPTX/NVPTXSubtarget.h:104
   bool hasImageHandles() const;
+  bool hasFP16() const { return SmVersion >= 53; }
+  bool allowFP16Math() const;
----------------
Do we need this function at all?  It's kind of confusing because pre-sm_53 we had fp16 loads/stores, so you could say we "had" fp16.  But also someone might accidentally call this instead of allowFP16Math().

================
Comment at: test/CodeGen/NVPTX/f16-instructions.ll:764
+; CHECK:      cvt.f32.f16     [[AF:%f[0-9]+]], [[A]];
+; CHECK:      sin.approx.f32  [[RF:%f[0-9]+]], [[AF]];
+; CHECK:      cvt.rn.f16.f32  [[R:%h[0-9]+]], [[RF]];
----------------
How do we know it's correct to lower this as `cvt.to.f16(sin.approx.f32(x))`?  That only works if we're guaranteed that the error of sin.approx.f32 is too small to be noticed in fp16.  But that doesn't seem guaranteed.  All the ISA says about precision is

> The maximum absolute error is 2^-20.9 in quadrant 00.

This error is too small to be represented in an fp16, which would normally mean we're good.  But because it qualifies with "in quadrant 00", that suggests that all bets are off if we're not in...whatever is quadrant 00.  (I presume it's the first quadrant?)

Same for cosine.

https://reviews.llvm.org/D28540