[PATCH] D28508: [NVPTX] Lower to sqrt.approx and rsqrt.approx under more circumstances.

Thu Jan 12 19:04:50 PST 2017

mehdi_amini added a comment.

In https://reviews.llvm.org/D28508#644206, @jlebar wrote:

> > What is SASS looking for the non approx version? (I believe llvm.sqrt should do the same as the non-approx without fast-math flag).
>
> https://gist.github.com/b3fa71a72a02785cc47be606556d6d4a

Wow...

>>> An unfortunate effect of the fact that we're using ptxas is that we may not be able to match the performance of {r}sqrt.approx with our own implementation in ptx.
>> 
>> Can you clarify what you mean?
> 
> I meant that I wasn't sure whether we could generate code which matched the performance+accuracy of PTX sqrt.approx without using that instruction  (e.g. by LLVM emitting a Newton's method hunk).  In particular, now that you parsed the asm -- we see here that we're calling a special HW instruction for the rsqrt, and I have no way to cause this instruction to be emitted except by writing the PTX sqrt.approx.

I bet you can get the rsqrt with the PTX rsqrt.approx.f32, but that's not really the point here.

> Unless the suggestion is to take the approx sqrt generated by PTX sqrt.approx and then refine it using Newton's method?  That's an interesting idea but out of scope for this patch, I think.  I'd rather wait to do that until someone wants it.

Technically I don't think it is correct for your patch to lower llvm.sqrt (with the FMF) to PTX sqrt.approx, because "The maximum absolute error for sqrt.f32 is TBD." 
llvm.sqrt should get roughly the same result as the last link you gave.
I don't how much it matter for users of the PTX backend though.

https://reviews.llvm.org/D28508