[PATCH] D66050: Improve division estimation of floating points.

Tue Sep 3 00:10:08 PDT 2019

qiucf added a comment.

In D66050#1654762 <https://reviews.llvm.org/D66050#1654762>, @spatel wrote:

> In D66050#1654733 <https://reviews.llvm.org/D66050#1654733>, @lebedev.ri wrote:
>
> > I think these two points weren't addressed.
> >  I'd like to see at least some publicly-stated numbers on accuracy,
> >  just so we //all// know this is going in the right direction for all inputs.
>
>
> Changing my 'accepted' until this is answered.
>
> The test at:
>  https://github.com/ecnelises/fp-division-test/
>  ...seems to do a small random sampling.
>
> The original transform was tested on x86 using brute force for all possible floats (1.0f/x) and is attached here:
>  https://bugs.llvm.org/show_bug.cgi?id=21385
>
> I'm not sure how to prove this, but by distributing the multiplication into the last step of the estimate, I think we are always trading better accuracy around the numerator value with potentially overflowing to infinity for extremely different numerator/denominator. That's a good trade-off IMO and within the loosely-defined behavior enabled by 'arcp' in LLVM and '-mrecip' with Clang.

Thanks for test case in PR21385 <https://bugs.llvm.org/show_bug.cgi?id=21385>. I'll write tests on a wider range of numbers. We, from my point of view, need two kind of tests:

- A compiler-independent program showing _distributing the multiplication into the last step of estimation_ is really more accurate. It shoule be just like the case you showed.
- A program with functions optimized at different level (e.g. `-Ofast` and `-O3`) comparing results of them with real divisions. This can originate from my previous `fp-division-test`. I think this is suitable for test suites.

Result of test should include:

- Accuracy (< 2ulp?) rate compared with real divisions.
- Accuracy rate compared with current implementation.
- Accuracy rate compared with other implementations, such as GCC.

A problem here: iterate from `0x00800000` to `0x7E800000` is acceptable for testing reciprocals, but not for testing divisions (`n^2`). I'm not sure changing iteration step from 1 to 10, 100 or larger to reduce running time is okay.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66050/new/

https://reviews.llvm.org/D66050