[PATCH] D141863: [llvm][APFloat] Add NaN-in-negative-zero formats by AMD and GraphCore

Reed Wanderman-Milne via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Feb 1 21:12:27 PST 2023


reedwm added a comment.

Looks good, just have some minor comments.



================
Comment at: llvm/include/llvm/ADT/APFloat.h:165
+    // NaN is represnted as negative zero. (FN -> Finite, UZ -> unsigned zero)
+    S_Float8E5M2FNUZ,
     // 8-bit floating point number mostly following IEEE-754 conventions with
----------------
You should document that the exponent bias is 16, since it differs from the IEEE standard exponent bias of  2**(num_exponent_bits - 1) - 1. And same for S_Float8E4M3FNUZ.


================
Comment at: llvm/lib/Support/APFloat.cpp:62
 
     // Only the Float8E5M2 has this behavior. There is no Inf representation. A
     // value is NaN if the exponent field and the mantissa field are all 1s.
----------------
This comment should be updated to mention the new dtypes, and also the NaN description no longer is accurate for the new dtypes.

Also, "Float8E5M2" should be replaced with "Float8E4M3FN". I accidentally put the wrong dtype in the comment when adding Float8E4M3FN.


================
Comment at: llvm/lib/Support/APFloat.cpp:77
+    // Represents the behavior in some 8-bit floating ponit types where NaN is
+    // represnted by the all-1s value
+    AllOnes,
----------------
mehdi_amini wrote:
> typo: represented
The sign bit can still be zero. I would clarify that for NaNs the exponent and mantissa are all 1s.


================
Comment at: llvm/lib/Support/APFloat.cpp:81
+    // Represents the behaviro in some 8-bit floating point types where NaN is
+    // represented by a sign bit of 1 and all 0s in the exponent (i.e. the
+    // negative zero encoding in a IEEE float).
----------------
Also mention that for NaNs, the mantissa is also all 0s (you only mentioned the exponent is all 0s).


================
Comment at: llvm/lib/Support/APFloat.cpp:840-841
   if (semantics->nonFiniteBehavior == fltNonfiniteBehavior::NanOnly) {
     // The only NaN representation is where the mantissa is all 1s, which is
     // non-signalling.
     SNaN = false;
----------------
This comment should be moved to the "else" branch you added (or just delete the comment).


================
Comment at: llvm/lib/Support/APFloat.cpp:2506
+             semantics->nanEncoding == fltNanEncoding::NegativeZero) {
+    *losesInfo = fromSemantics.nanEncoding != fltNanEncoding::NegativeZero;
+    fs = *losesInfo ? opInexact : opOK;
----------------
Don't we only lose info if the source of the conversion is -0?


================
Comment at: llvm/unittests/ADT/APFloatTest.cpp:1871
     const unsigned long long bitPattern[2];
-    const unsigned bitPatternLength;
+    const unsigned bitPatternLength; // todo preserve sign flag
   } const GetZeroTest[] = {
----------------
What does this TODO mean?


================
Comment at: llvm/unittests/ADT/APFloatTest.cpp:2094
+
+  // Except in castss between ourselves
+  losesInfo = true;
----------------
Typo: castss -> casts


================
Comment at: llvm/unittests/ADT/APFloatTest.cpp:5377
+
+  // 2. NextUp of smallest denormal is +0
+  test = APFloat::getSmallest(APFloat::Float8E5M2FNUZ(), true);
----------------
Add the word "negative". And similarly in Float8E4M3FNUZNext


================
Comment at: llvm/unittests/ADT/APFloatTest.cpp:5439-5440
+            APFloat(APFloat::Float8E5M2FNUZ(), "59392").convertToDouble());
+  // Round up, causing overflow to NaN
+  EXPECT_TRUE(APFloat(APFloat::Float8E5M2FNUZ(), "131072").isNaN());
+  // Overflow without rounding
----------------
This is the same as the next test case


================
Comment at: llvm/unittests/ADT/APFloatTest.cpp:5473
+
+  // Dividing the negatize float_min by anything hives +0
+  test = APFloat::getSmallest(APFloat::Float8E4M3FNUZ(), true);
----------------
typo: hives -> gives


================
Comment at: llvm/unittests/ADT/APFloatTest.cpp:5650
+
+TEST(APFloatTest, Float8E4M3FNUZExhaustive) {
+  // Test each of the 256 Float8E4M3FNUZ values.
----------------
This looks pretty similar to Float8E5M2FNUZExhaustive. Maybe merge them.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141863/new/

https://reviews.llvm.org/D141863



More information about the llvm-commits mailing list