[PATCH] D21127: Remove redundant FMUL in Newton-Raphson SQRT code

Thu Jun 9 07:41:21 PDT 2016

spatel added a comment.

Thanks, Nikolai!
I reproduced your results for both FMA and non-FMA on my local Haswell machine.
Here is the output from AMD Jaguar (no FMA):

                                           est1        est2
  Inexact results                     828912065   694175396
   Estimate missed by 1 ULP:          788646331   681356143
   Estimate missed by 2 ULP:           39579384    12754737
   Estimate missed by 3 ULP:             680779       64516
   Estimate missed by 4 ULP:               5571           0
   Estimate missed by >= 5 ULP with one N-R step = 0

So again, eliminating the extra multiply benefits accuracy in general. I attached my hacked tester program for this experiment to PR21385 in case anyone else wants to try it or adapt it to non-x86, but I'm assuming the accuracy improvement holds for any architecture because we're eliminating some intermediate error with the refactoring.

http://reviews.llvm.org/D21127