[PATCH] D21127: Remove redundant FMUL in Newton-Raphson SQRT code
Nikolai Bozhenov via llvm-commits
llvm-commits at lists.llvm.org
Thu Jun 9 00:16:13 PDT 2016
n.bozhenov added a comment.
Thanks for great questions, Sanjay!
I slightly modified the example from PR21385 and compared these two
sequences to calculate square roots (the current one and the patched one):
est1 = (-0.5f * est0) * (-3.0f + est0 * est0 * f) * f;
float ae = est0 * f;
est2 = (-0.5f * ae) * (-3.0f + est0 * ae);
And I obtained the following results:
est1 est2
Total tests: 2130706432 2130706432
Inexact results: 926539007 834159368
Estimate missed by 1 ULP: 862814916 796017331
Estimate missed by 2 ULP: 62179595 37665787
Estimate missed by 3 ULP: 1537746 476250
Estimate missed by 4 ULP: 6750 0
Estimate missed by >4 ULP: 0 0
As you can see, with the patch square roots are significantly more
accurate on average, though I don't have a good explanation for this.
I performed testing on a number of Intel microarchitectures (including
Atom) and got exactly the same results for all of them.
As for improved hardware SQRT efficiency in modern Intel CPUs, I'm
working on a patch that address this particular issue and I will
share the patch very soon.
http://reviews.llvm.org/D21127
More information about the llvm-commits
mailing list