[PATCH] Improve sqrt estimate algorithm
Sanjay Patel
spatel at rotateright.com
Wed Oct 8 13:53:23 PDT 2014
Hi hfinkel, wschmidt, willschm,
This patch changes the fast-math implementation for calculating sqrt(x) from:
y = 1 / (1 / sqrt(x))
to:
y = x * (1 / sqrt(x))
This has 2 benefits: less code / faster code and one less estimate instruction that may lose precision.
The only target that will be affected (until http://reviews.llvm.org/D5658 is approved) is PPC. The difference in codegen for PPC is 2 less flops for a single-precision sqrtf or vector sqrtf and 4 less flops for a double-precision sqrt. We also eliminate a constant load and extra register usage.
Here's the existing PPC codegen for a single-precision scalar sqrtf() using a reciprocal square root estimate and a reciprocal estimate:
.L.goo3:
# BB#0:
addis 3, 2, .LCPI10_2 at toc@ha
lfs 0, .LCPI10_2 at toc@l(3)
fcmpu 0, 1, 0
beq 0, .LBB10_2
# BB#1:
frsqrtes 0, 1
addis 3, 2, .LCPI10_0 at toc@ha
lfs 2, .LCPI10_0 at toc@l(3)
addis 3, 2, .LCPI10_1 at toc@ha
lfs 13, .LCPI10_1 at toc@l(3)
fnmsubs 12, 1, 2, 1
fmuls 3, 0, 0
fmadds 1, 12, 3, 2
fmuls 0, 0, 1
fres 1, 0 <--- reciprocal estimate
fnmsubs 0, 0, 1, 13 <--- refinement
fmadds 0, 1, 0, 1 <--- refinement
.LBB10_2:
fmr 1, 0
blr
After the patch, we calculate the rsqrt and multiply by the original operand:
.L.goo3:
# BB#0:
addis 3, 2, .LCPI10_1 at toc@ha
lfs 0, .LCPI10_1 at toc@l(3)
fcmpu 0, 1, 0
beq 0, .LBB10_2
# BB#1:
frsqrtes 0, 1
addis 3, 2, .LCPI10_0 at toc@ha
lfs 2, .LCPI10_0 at toc@l(3) <--- only need 1 constant for NR sqrt refinement
fnmsubs 3, 1, 2, 1
fmuls 4, 0, 0
fmadds 2, 3, 4, 2
fmuls 0, 0, 2
fmuls 0, 1, 0 <--- reciprocal calc replaced by multiply
.LBB10_2:
fmr 1, 0
blr
http://reviews.llvm.org/D5682
Files:
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
test/CodeGen/PowerPC/recipest.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D5682.14600.patch
Type: text/x-patch
Size: 2753 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141008/56814e3a/attachment.bin>
More information about the llvm-commits
mailing list