[clang] [llvm] [Clang][NVVM] Support `-f[no-]cuda-prec-sqrt` and propagate precision flag to `NVVMReflect` (PR #134244)

Tue Apr 8 12:20:40 PDT 2025

AlexMaclean wrote:

It seems like we already have perhaps too many mechanisms to control how sqrt gets lowered. There is the `__nv_sqrtf` libdevice function which chooses between specific (1:1 to PTX) intrinsics based on NVVMReflect and then there is also `llvm.sqrt` and `nvvm.sqrt.f` which are lowered and optimized based on command-line options and function and instruction level flags, each in its own way. 

I think for more fine grained responsiveness to instruction and function level options it makes sense to use the existing intrinsics. While, it is consistent with the existing design to treat NVVMReflect as operating globally across the entire module. I'm not sure it makes sense to introduce a new module flag and clang cl opt though...

I personally agree with @Artem-B that `__nv_sqrtf`+NVVMReflect may not be the way to go. Using one of the intrinsics seems like a better approach but I may be missing something.

https://github.com/llvm/llvm-project/pull/134244