[libc-commits] [PATCH] D136799: [libc] Implement a high-precision floating point class.

Tue Dec 13 12:41:19 PST 2022

sivachandra accepted this revision.
sivachandra added inline comments.
This revision is now accepted and ready to land.

================
Comment at: libc/src/__support/FPUtil/dyadic_float.h:47
+    exponent = x_bits.get_exponent() - FloatProperties<T>::MANTISSA_WIDTH;
+    mantissa = MantissaType(x_bits.get_explicit_mantissa());
+    normalize();
----------------
This can lead to truncation if `Bits` is less than `MANTISSA_WIDTH`. Should we add a static check here, may be extending the above `enable_if`:

```
 template <typename T, cpp::enable_if_t<cpp::is_floating_point_v<T> && FloatProperties<T>::MANTISSA_WIDTH <= Bits, int> = 0>
```

================
Comment at: libc/src/__support/FPUtil/dyadic_float.h:121
+// output.  The absolute errors compared to the mathematical sum is bounded by:
+//   | quick_add(a, b) - (a + b) | < MSB(a + b) * 2^(-Bits + 2),
+// i.e., errors are up to 2 ULPs.
----------------
Add a mathematical expression which illustrates what actually is quick_add doing. Something like:

```
quick_add.exponent = max(...)
// aligning exponents - explain why
quick_add.mantissa = a.mantissa + b.mantissa;
```

================
Comment at: libc/src/__support/FPUtil/dyadic_float.h:174
+// compared to the mathematical product is bounded by:
+//   2 * errors of quick_mul_hi = 2 * (UInt<Bits>::WordCount - 1) in ULPs.
+// Assume inputs are normalized (by constructors or other functions) so that we
----------------
Same mathematical explanation as that for `quick_add`.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136799/new/

https://reviews.llvm.org/D136799