[PATCH] D21127: Remove redundant FMUL in Newton-Raphson SQRT code
Sanjay Patel via llvm-commits
llvm-commits at lists.llvm.org
Thu Jun 9 07:41:21 PDT 2016
spatel added a comment.
Thanks, Nikolai!
I reproduced your results for both FMA and non-FMA on my local Haswell machine.
Here is the output from AMD Jaguar (no FMA):
est1 est2
Inexact results 828912065 694175396
Estimate missed by 1 ULP: 788646331 681356143
Estimate missed by 2 ULP: 39579384 12754737
Estimate missed by 3 ULP: 680779 64516
Estimate missed by 4 ULP: 5571 0
Estimate missed by >= 5 ULP with one N-R step = 0
So again, eliminating the extra multiply benefits accuracy in general. I attached my hacked tester program for this experiment to PR21385 in case anyone else wants to try it or adapt it to non-x86, but I'm assuming the accuracy improvement holds for any architecture because we're eliminating some intermediate error with the refactoring.
http://reviews.llvm.org/D21127
More information about the llvm-commits
mailing list