[PATCH] Improve sqrt estimate algorithm

Sanjay Patel spatel at rotateright.com
Wed Oct 8 13:53:23 PDT 2014


Hi hfinkel, wschmidt, willschm,

This patch changes the fast-math implementation for calculating sqrt(x) from:
y = 1 / (1 / sqrt(x))
to:
y = x * (1 / sqrt(x))

This has 2 benefits: less code / faster code and one less estimate instruction that may lose precision.

The only target that will be affected (until http://reviews.llvm.org/D5658 is approved) is PPC. The difference in codegen for PPC is 2 less flops for a single-precision sqrtf or vector sqrtf and 4 less flops for a double-precision sqrt. We also eliminate a constant load and extra register usage.

Here's the existing PPC codegen for a single-precision scalar sqrtf() using a reciprocal square root estimate and a reciprocal estimate:
   .L.goo3:
   # BB#0:
      addis 3, 2, .LCPI10_2 at toc@ha
      lfs 0, .LCPI10_2 at toc@l(3)
      fcmpu 0, 1, 0
      beq 0, .LBB10_2
   # BB#1:
      frsqrtes 0, 1
      addis 3, 2, .LCPI10_0 at toc@ha
      lfs 2, .LCPI10_0 at toc@l(3)
      addis 3, 2, .LCPI10_1 at toc@ha
      lfs 13, .LCPI10_1 at toc@l(3)
      fnmsubs 12, 1, 2, 1
      fmuls 3, 0, 0
      fmadds 1, 12, 3, 2
      fmuls 0, 0, 1
      fres 1, 0                    <--- reciprocal estimate
      fnmsubs 0, 0, 1, 13  <--- refinement
      fmadds 0, 1, 0, 1      <--- refinement
   .LBB10_2:
      fmr 1, 0
      blr

After the patch, we calculate the rsqrt and multiply by the original operand:
   .L.goo3:
   # BB#0:
      addis 3, 2, .LCPI10_1 at toc@ha
      lfs 0, .LCPI10_1 at toc@l(3)
      fcmpu 0, 1, 0
      beq 0, .LBB10_2
   # BB#1:
      frsqrtes 0, 1
      addis 3, 2, .LCPI10_0 at toc@ha
      lfs 2, .LCPI10_0 at toc@l(3)   <--- only need 1 constant for NR sqrt refinement
      fnmsubs 3, 1, 2, 1
      fmuls 4, 0, 0
      fmadds 2, 3, 4, 2
      fmuls 0, 0, 2
      fmuls 0, 1, 0   <--- reciprocal calc replaced by multiply
   .LBB10_2:
      fmr 1, 0
      blr

http://reviews.llvm.org/D5682

Files:
  lib/CodeGen/SelectionDAG/DAGCombiner.cpp
  test/CodeGen/PowerPC/recipest.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D5682.14600.patch
Type: text/x-patch
Size: 2753 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141008/56814e3a/attachment.bin>


More information about the llvm-commits mailing list