[PATCH] D78606: [DAGCombine] Adding a new Newton-Raphson implementation to leverage the FMA
Qing Shan Zhang via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Apr 21 19:29:22 PDT 2020
steven.zhang created this revision.
steven.zhang added reviewers: spatel, RKSimon, hfinkel, renenkel, evandro, jsji, nemanjai, PowerPC.
Herald added subscribers: kerbowa, wuzish, kbarton, hiraditya, nhaehnle, jvesely, arsenm, jholewinski.
Herald added a project: LLVM.
steven.zhang marked an inline comment as done.
steven.zhang added inline comments.
================
Comment at: llvm/include/llvm/CodeGen/TargetLowering.h:4118
+ /// This enum inndicates the different methods we use to do the Newton
+ /// iterations for sqrt/rsqrt
----------------
See a typo here and will fix it later.
Adding a new Newton-Raphson implementation that leverage the FMA which save one instruction for 2 iterations. And it also improves the precision due to the use of FMA/FNMSUB.
This is the measurement from PowerPC(courtesy of @renenkel):
The new algorithm is good to about 0.7 ulps for arguments >= 2^(-1022). The old algorithm is worse at about 1.7 ulps for arguments >= 2^(-1022).
The new algorithm speedup 1.13x on Power9
FYI. The new implementation:
sqrt(n) -> n * rsqrt(n)
Newton iteration formula: X{i+1} = X{i} - F(X{i})/F'(X{i})
F(x) = 1/x^2 - n # Find the 'x' to make this function as zero
-->
X{i+1} = X{i} * (1.5 - 0.5*n*X{i}^2)
-->
X{i+1} = X{i} + X{i} * (0.5 - 0.5*X{i}*n*X{i})
sqrt(n) = n*X{i+1} = n*X{i} + n*X{i} * (0.5 - 0.5*X{i}*n*X{i})
-->
sqrt(n) = n*X{i} + 0.5*X{i}*(n - (n*X{i})^2)
So, what we need to do is just iteration the n*X{i}, 0.5*X{i} according to formula X{i+1} = X{i} + X{i} * (0.5 - 0.5*X{i}*n*X{i}) First.
H{0} = 0.5*y0 # 0.5*X{i} y0 is the estimate value
S{0} = n * y0 # n * X{i}
D{0} = 0.5 - H*S # 0.5 - 0.5*X{i}*n*X{i}
Then, we have:
H{i+1} = 0.5 * X{i+1} = 0.5*X{i} + 0.5*X{i} * (0.5 - 0.5*X{i}*n*X{i})
-->
H{i+1} = H{i} + H{i} * D{i}
S{i+1} = n * X{i+1} = n*X{i} + n * X{i} * (0.5 - 0.5*X{i}*n*X{i})
-->
S{i+1} = S{i} + S{i} * D{i}
So, we can do the iteration for H{i} and S{i} to pursue better precision. After that,
sqrt(n) = n*X{i} + 0.5*X{i}*(n - (n*X{i})^2)
-->
sqrt(n) = S{i} + H{i} * (n - S{i}^2)
Thus, we have these 7 instructions and use one constant 0.5:
H = 0.5*y0 # FMUL
S = n * y0 # FMUL
D = 0.5 - S * H # FNMSUB
H = H * D + H # FMA
S = S * D + S # FMA
E = n - S * S # FNMSUB
res = E * H + S # FMA
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D78606
Files:
llvm/include/llvm/CodeGen/TargetLowering.h
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
llvm/lib/Target/AArch64/AArch64ISelLowering.h
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
llvm/lib/Target/NVPTX/NVPTXISelLowering.h
llvm/lib/Target/PowerPC/PPC.td
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
llvm/lib/Target/PowerPC/PPCISelLowering.h
llvm/lib/Target/X86/X86ISelLowering.cpp
llvm/lib/Target/X86/X86ISelLowering.h
llvm/test/CodeGen/PowerPC/fma-mutate.ll
llvm/test/CodeGen/PowerPC/fmf-propagation.ll
llvm/test/CodeGen/PowerPC/qpx-recipest.ll
llvm/test/CodeGen/PowerPC/recipest.ll
llvm/test/CodeGen/PowerPC/vsx-fma-mutate-trivial-copy.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D78606.259147.patch
Type: text/x-patch
Size: 38521 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20200422/4c1b4c47/attachment-0001.bin>
More information about the llvm-commits
mailing list