[PATCH] D127140: [APFloat] Fix truncation of certain subnormal numbers

Tue Jun 7 07:42:55 PDT 2022

danilaml added a comment.

In D127140#3561467 <https://reviews.llvm.org/D127140#3561467>, @efriedma wrote:

> I'm not sure it's safe to assume that the lost fraction is lfLessThanHalf, in general.  At least, it isn't obvious to me, particularly for cases involving bfloat.
>
> We already have a bunch of code to adjust the shift amount for truncations: the "If this is a truncation of a denormal number [...]" block.  Can we adjust that code to reduce the shift amount in this case?

If I understood correctly (which is not a given, ofc), using `lfLessThanHalf` would just prevent rounding up whichever number results after right shift, which is exactly the expected behavior for stuff like bfloat (that just truncates).
Conversely, I'm not sure if it's safe to adjust the shift amount for truncations either. Is `normalize` expected to handle arbitrary input and "normalize" it to the current semantics, with the only exception being the "zero significand, non-zero lost fraction"?
I've tried adding an `assert` and quickly found the place where the code does exactly this:

  /* Underflow to zero and round.  */
  category = fcNormal;
  zeroSignificand();
  fs = normalize(rounding_mode, lfLessThanHalf);

in APFloat.cpp:2770, `IEEEFloat::convertFromDecimalString` (from the `EXPECT_TRUE(APFloat(APFloat::IEEEdouble(), "1e-99999").isPosZero());` test).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127140/new/

https://reviews.llvm.org/D127140