[clang] [llvm] [clang][NVPTX] Add intrinsics and builtins for mixed-precision FP arithmetic (PR #168359)

Wed Nov 19 11:20:38 PST 2025

================
@@ -460,6 +478,52 @@ def __nvvm_add_rz_d : NVPTXBuiltin<"double(double, double)">;
 def __nvvm_add_rm_d : NVPTXBuiltin<"double(double, double)">;
 def __nvvm_add_rp_d : NVPTXBuiltin<"double(double, double)">;
 
+def __nvvm_add_mixed_f16_f32 : NVPTXBuiltinSMAndPTX<"float(__fp16, float)", SM_100, PTX86>;
+def __nvvm_add_mixed_rn_f16_f32 : NVPTXBuiltinSMAndPTX<"float(__fp16, float)", SM_100, PTX86>;
----------------
Artem-B wrote:

This set of intrinsics appears to be regular enough to consider using tablegen loops to generate them.

Not sure if it's going to end up being an improvement, but if it would reduce the boilerplate, it may be worth giving it a try.

https://github.com/llvm/llvm-project/pull/168359