[PATCH] D85031: [builtins] Unify the softfloat division implementation

Thu Aug 20 09:17:55 PDT 2020

sepavloff added inline comments.

================
Comment at: compiler-rt/lib/builtins/fp_div_impl.inc:99
+  //   0 < x0(b) < 1
+  //   abs(x0(b) - 1/b) <= 3/4 - 1/sqrt(2)
+
----------------
This estimation is absent from the original comment. Do you have reference where it came from? Also the original comment states `This is accurate to about 3.5 binary digits`. Is still true? If yes, it could be worth copying here.

================
Comment at: compiler-rt/lib/builtins/fp_div_impl.inc:101-102
+
+  // Then, refine the reciprocal estimate using a Newton-Raphson iteration:
+  //     x_{n+1} = x_n * (2 - x_n * b)
+  // Let e_n := x_n - 1/b_hw
----------------
The original comment states:
```
  // This doubles the number of correct binary digits in the approximation
  // with each iteration.
```
It is true in this implementation? If yes, it could be worth copying here.

================
Comment at: compiler-rt/lib/builtins/fp_div_impl.inc:109
+
+#if NUMBER_OF_HALF_ITERATIONS > 0
+  // Starting with (n-1) half-width iterations
----------------
It is good optimization. Could you please put a comment shortly describing the idea of using half-sized temporaries?

================
Comment at: compiler-rt/lib/builtins/fp_div_impl.inc:114
+  // C is (3/4 + 1/sqrt(2)) - 1 truncated to W0 fractional bits as UQ0.HW
+#if defined(SINGLE_PRECISION)
+  const half_rep_t C_hw = HALF_REP_C(0x7504) << (HW - 16);
----------------
In what cases 16-bit temporaries are used?  `NUMBER_OF_HALF_ITERATIONS` is set to zero in `divsf3.c`.

================
Comment at: compiler-rt/lib/builtins/fp_div_impl.inc:130
+    // Cannot overflow by construction and is larger than 2 - b*x by at most 1*Ulp.
+    half_rep_t corr_UQ1_hw = 0 /* = 2 */ - ((rep_t)x_UQ0_hw * b_UQ1_hw >> HW);
+    // Would be at most [1.]00000 after overflow if rounding (0 - x_UQ0_hw * b_UQ1_hw) down.
----------------
It would be better to put short comment to explain using 0 instead of 2.

================
Comment at: compiler-rt/lib/builtins/fp_div_impl.inc:184
+  // x_UQ0 * b_UQ1 = (x_UQ0_hw * 2^HW) * (b_UQ1_hw * 2^HW + blo) - b_UQ1
+  rep_t corr_UQ1 = 0 - (   (rep_t)x_UQ0_hw * b_UQ1_hw
+                        + ((rep_t)x_UQ0_hw * blo >> HW)
----------------
`x_UQ0_hw` and `b_UQ1_hw` are declared inside the conditional block `#if NUMBER_OF_HALF_ITERATIONS > 0`. Does `NUMBER_OF_FULL_ITERATIONS != 1` always imply `NUMBER_OF_HALF_ITERATIONS > 0 `?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85031/new/

https://reviews.llvm.org/D85031