[clang] [llvm] [Clang][NVVM] Support `-f[no-]cuda-prec-sqrt` and propagate precision flag to `NVVMReflect` (PR #134244)
Artem Belevich via llvm-commits
llvm-commits at lists.llvm.org
Mon Apr 7 12:21:38 PDT 2025
Artem-B wrote:
@AlexMaclean who authored #89417 and possibly other NVIDIA folks may have some thoughts on this.
In general, making it per-function attribute makes sense on LLVM level.
We will also need to reconcile it with the https://github.com/llvm/llvm-project/blob/10bef367a5643bc41d0172b02e080645c68f821a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp#L94-L96
However, propagating it to NVVMReflect pass complicates things, as libdevice we're linking with is linked once per module.
I think we may need to disentangle libdevice from the IR generated by clang.
Currently, CUDA compilation. call to `sqrtf()` maps to `__nv_sqrtf(__a)` which is served by libdevice bitcode and which chooses precise or approximate version of LLVM intrinsic based on NVVMReflect.
What we need to do is change `sqrtf()` to use clang builtins() so we retain per-function control on lowering it.
Once we have that in place, we can independently control sqrtf precision via function and/or module attributes, and do it independently from the choice we make via NVVMReflect for __nv_sqrtf().
https://github.com/llvm/llvm-project/pull/134244
More information about the llvm-commits
mailing list