[PATCH] D21379: [X86] Heuristic to selectively build Newton-Raphson SQRT estimation
Nikolai Bozhenov via llvm-commits
llvm-commits at lists.llvm.org
Wed Jun 15 07:38:39 PDT 2016
n.bozhenov created this revision.
n.bozhenov added reviewers: bogner, hfinkel, andreadb, spatel, nadav.
n.bozhenov added subscribers: zansari, DavidKreitzer, zinovy.nis, llvm-commits.
Herald added a reviewer: tstellarAMD.
Herald added a subscriber: arsenm.
On modern Intel processors hardware SQRT in many cases is faster than RSQRT
followed by Newton-Raphson refinement. The patch introduces a simple heuristic
to choose between hardware SQRT instruction and Newton-Raphson software
The patch treats scalars and vectors differently. The heuristic is that for
scalars the compiler should optimize for latency while for vectors it should
optimize for throughput.
Basically, the patch disables scalar NR for big cores and disables NR completely
for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores.
Secondly, vector SQRT has been greatly improved in Skylake and has better
throughput compared to NR.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 10536 bytes
Desc: not available
More information about the llvm-commits