[PATCH] D28508: [NVPTX] Implement NVPTXTargetLowering::getSqrtEstimate.
Justin Lebar via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jan 31 15:43:17 PST 2017
jlebar added a comment.
In https://reviews.llvm.org/D28508#662282, @escha wrote:
> That really surprises me that it's faster! I would expect SFU functions like RCP/RSQRT to dwarf the cost of a multiply, especially for double.
Me too. :)
> Also, do be careful that rcp(rsqrt(x)) and x * rsqrt(x) have different precisions under some implementations (because fmul is 0.5 ULP, while rcp/rsqrt may be as low as 2.5 ULP each).
Yeah, I'm banking on the "you asked for it" aspect of fast-math. In particular, the only approximate f64 rcp instruction is flush-to-zero, so we call that even if ftz is entirely disabled.
The performance difference is the same with and without ftz on the mul:
precise sqrt - 73us
x*rsqrt.approx(x) - 64us
recip.approx(rsqrt.approx(x)) - 48us
rsqrt.approx(x) - 48us
Maybe it's an unfair microbenchmark, because I do nothing other than the sqrt and a store. https://gist.github.com/0ac6f0b0f994339838f5452f96e77cff
More information about the llvm-commits