[PATCH] D28508: [NVPTX] Lower to sqrt.approx and rsqrt.approx under more circumstances.

Tue Jan 10 05:30:51 PST 2017

hfinkel added a comment.

Can you comment on how this relates to other targets? On x86, AArch64, PPC, and for the AMD GPUs, we have implemented the callback functions getSqrtEstimate and getRecipEstimate to handle generating estimates. The callbacks also specify how many refinement iterations are used to provide answers of approximately the correct precision.

This is important because we allow for user control of the number of refinement steps, where approximations are used, etc. (see Clang's test/Driver/mrecip.c - we implement a generalized version of GCC's -mrecip option). Obviously there are some issues here with separating host vs. accelerator options, but we should try to reuse this infrastructure for NVPTX to the extent possible.

P.S. You might also look at implementing combineRepeatedFPDivisors in NVPTX.

================
Comment at: llvm/lib/Target/NVPTX/NVPTXInstrInfo.td:941
+// TODO: Should we turn this on when only one of the *APPROX flags is enabled?
+// Our value is already approximate...
 //
----------------
I don't understand this comment. I thought that f32 sqrt on NVIDIA GPUs was only approximate for early generations (sm_1x) and was correct for later ones (sm_2x+): https://devtalk.nvidia.com/default/topic/475101/sqrt-precision/

https://reviews.llvm.org/D28508