[libc-commits] [libc] [libc][math] Fix incorrect logic in fputil::generic::add_or_sub (PR #116129)
via libc-commits
libc-commits at lists.llvm.org
Wed Nov 13 16:34:17 PST 2024
================
@@ -160,20 +160,21 @@ add_or_sub(InType x, InType y) {
} else {
InStorageType max_mant = max_bits.get_explicit_mantissa() << GUARD_BITS_LEN;
InStorageType min_mant = min_bits.get_explicit_mantissa() << GUARD_BITS_LEN;
- int alignment =
- max_bits.get_biased_exponent() - min_bits.get_biased_exponent();
+
+ int alignment = (max_bits.get_biased_exponent() - max_bits.is_normal()) -
+ (min_bits.get_biased_exponent() - min_bits.is_normal());
----------------
overmighty wrote:
The formula given in section 9.2.3.2 of *Handbook of Floating-Point Arithmetic* is $\delta = (E_x - n_x) - (E_y - n_y)$. When I implemented `fputil::generic::add_or_sub`, I asked myself why it wasn't just $\delta = E_x - E_y$, and ended up using that instead of the formula given in the book. Today I remembered asking myself that question, so I thought about it again and now it's obvious.
https://github.com/llvm/llvm-project/pull/116129
More information about the libc-commits
mailing list