[PATCH] D85031: [builtins] Unify the softfloat division implementation

Mon Aug 31 07:58:46 PDT 2020

sepavloff added inline comments.

================
Comment at: compiler-rt/lib/builtins/fp_div_impl.inc:109
+
+#if NUMBER_OF_HALF_ITERATIONS > 0
+  // Starting with (n-1) half-width iterations
----------------
scanon wrote:
> sepavloff wrote:
> > atrosinenko wrote:
> > > sepavloff wrote:
> > > > It is good optimization. Could you please put a comment shortly describing the idea of using half-sized temporaries?
> > > The idea is just "I guess this takes less CPU time and I have managed to prove error bounds for it". :) Specifically, for float128, the rep_t * rep_t multiplication will be emulated with lots of CPU instructions while the lower half contain some noise at that point. This particular optimization did exist in the original implementation for float64 and float128. For float32 it had not much sense, I guess. Still, estimations were calculated for the case of float32 with half-size iterations as it may be useful for MSP430 and other 16-bit targets.
> > The idea is clear but it require some study of the sources. I would propose to add a comment saying:
> > ```
> > At the first iterations number of significant digits is small, so we may use shorter type values. Operations on them are usually faster.
> > ```
> > or something like that.
> This is absolutely standard in HW construction of pipelined iterative dividers and square root units, so I'm not sure how much explanation is really needed =)
> This is absolutely standard in HW construction of pipelined iterative dividers and square root units, so I'm not sure how much explanation is really needed =)

I think now the code has enough explanations to be easily understood by mere mortals also :)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85031/new/

https://reviews.llvm.org/D85031