[PATCH] D85031: [builtins] Unify the softfloat division implementation
Serge Pavlov via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Thu Aug 20 09:17:55 PDT 2020
sepavloff added inline comments.
================
Comment at: compiler-rt/lib/builtins/fp_div_impl.inc:99
+ // 0 < x0(b) < 1
+ // abs(x0(b) - 1/b) <= 3/4 - 1/sqrt(2)
+
----------------
This estimation is absent from the original comment. Do you have reference where it came from? Also the original comment states `This is accurate to about 3.5 binary digits`. Is still true? If yes, it could be worth copying here.
================
Comment at: compiler-rt/lib/builtins/fp_div_impl.inc:101-102
+
+ // Then, refine the reciprocal estimate using a Newton-Raphson iteration:
+ // x_{n+1} = x_n * (2 - x_n * b)
+ // Let e_n := x_n - 1/b_hw
----------------
The original comment states:
```
// This doubles the number of correct binary digits in the approximation
// with each iteration.
```
It is true in this implementation? If yes, it could be worth copying here.
================
Comment at: compiler-rt/lib/builtins/fp_div_impl.inc:109
+
+#if NUMBER_OF_HALF_ITERATIONS > 0
+ // Starting with (n-1) half-width iterations
----------------
It is good optimization. Could you please put a comment shortly describing the idea of using half-sized temporaries?
================
Comment at: compiler-rt/lib/builtins/fp_div_impl.inc:114
+ // C is (3/4 + 1/sqrt(2)) - 1 truncated to W0 fractional bits as UQ0.HW
+#if defined(SINGLE_PRECISION)
+ const half_rep_t C_hw = HALF_REP_C(0x7504) << (HW - 16);
----------------
In what cases 16-bit temporaries are used? `NUMBER_OF_HALF_ITERATIONS` is set to zero in `divsf3.c`.
================
Comment at: compiler-rt/lib/builtins/fp_div_impl.inc:130
+ // Cannot overflow by construction and is larger than 2 - b*x by at most 1*Ulp.
+ half_rep_t corr_UQ1_hw = 0 /* = 2 */ - ((rep_t)x_UQ0_hw * b_UQ1_hw >> HW);
+ // Would be at most [1.]00000 after overflow if rounding (0 - x_UQ0_hw * b_UQ1_hw) down.
----------------
It would be better to put short comment to explain using 0 instead of 2.
================
Comment at: compiler-rt/lib/builtins/fp_div_impl.inc:184
+ // x_UQ0 * b_UQ1 = (x_UQ0_hw * 2^HW) * (b_UQ1_hw * 2^HW + blo) - b_UQ1
+ rep_t corr_UQ1 = 0 - ( (rep_t)x_UQ0_hw * b_UQ1_hw
+ + ((rep_t)x_UQ0_hw * blo >> HW)
----------------
`x_UQ0_hw` and `b_UQ1_hw` are declared inside the conditional block `#if NUMBER_OF_HALF_ITERATIONS > 0`. Does `NUMBER_OF_FULL_ITERATIONS != 1` always imply `NUMBER_OF_HALF_ITERATIONS > 0 `?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D85031/new/
https://reviews.llvm.org/D85031
More information about the cfe-commits
mailing list