[clang] [llvm] [Clang][NVVM] Support `-f[no-]cuda-prec-sqrt` and propagate precision flag to `NVVMReflect` (PR #134244)

Mon Apr 7 12:21:38 PDT 2025

Artem-B wrote:

@AlexMaclean who authored #89417 and possibly other NVIDIA folks may have some thoughts on this.

In general, making it per-function attribute makes sense on LLVM level.

We will also need to reconcile it with the https://github.com/llvm/llvm-project/blob/10bef367a5643bc41d0172b02e080645c68f821a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp#L94-L96

However, propagating it to NVVMReflect pass complicates things, as libdevice we're linking with is linked once per module.

I think we may need to disentangle libdevice from the IR generated by clang.

Currently, CUDA compilation. call to `sqrtf()` maps to `__nv_sqrtf(__a)` which is served by libdevice bitcode and which chooses precise or approximate version of LLVM intrinsic based on NVVMReflect.

What we need to do is change `sqrtf()` to use clang builtins() so we retain per-function control on lowering it.
Once we have that in place, we can independently control sqrtf precision via function and/or module attributes, and do it independently from the choice we make via NVVMReflect for __nv_sqrtf().

https://github.com/llvm/llvm-project/pull/134244