[PATCH] D66050: Improve division estimation of floating points.
Qiu Chaofan via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Sep 3 00:10:08 PDT 2019
qiucf added a comment.
In D66050#1654762 <https://reviews.llvm.org/D66050#1654762>, @spatel wrote:
> In D66050#1654733 <https://reviews.llvm.org/D66050#1654733>, @lebedev.ri wrote:
>
> > I think these two points weren't addressed.
> > I'd like to see at least some publicly-stated numbers on accuracy,
> > just so we //all// know this is going in the right direction for all inputs.
>
>
> Changing my 'accepted' until this is answered.
>
> The test at:
> https://github.com/ecnelises/fp-division-test/
> ...seems to do a small random sampling.
>
> The original transform was tested on x86 using brute force for all possible floats (1.0f/x) and is attached here:
> https://bugs.llvm.org/show_bug.cgi?id=21385
>
> I'm not sure how to prove this, but by distributing the multiplication into the last step of the estimate, I think we are always trading better accuracy around the numerator value with potentially overflowing to infinity for extremely different numerator/denominator. That's a good trade-off IMO and within the loosely-defined behavior enabled by 'arcp' in LLVM and '-mrecip' with Clang.
Thanks for test case in PR21385 <https://bugs.llvm.org/show_bug.cgi?id=21385>. I'll write tests on a wider range of numbers. We, from my point of view, need two kind of tests:
- A compiler-independent program showing _distributing the multiplication into the last step of estimation_ is really more accurate. It shoule be just like the case you showed.
- A program with functions optimized at different level (e.g. `-Ofast` and `-O3`) comparing results of them with real divisions. This can originate from my previous `fp-division-test`. I think this is suitable for test suites.
Result of test should include:
- Accuracy (< 2ulp?) rate compared with real divisions.
- Accuracy rate compared with current implementation.
- Accuracy rate compared with other implementations, such as GCC.
A problem here: iterate from `0x00800000` to `0x7E800000` is acceptable for testing reciprocals, but not for testing divisions (`n^2`). I'm not sure changing iteration step from 1 to 10, 100 or larger to reduce running time is okay.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D66050/new/
https://reviews.llvm.org/D66050
More information about the llvm-commits
mailing list