[clang] [llvm] [NVPTX] Add ex2.approx bf16 support and cleanup intrinsic definition (PR #165446)

Alex MacLean via cfe-commits cfe-commits at lists.llvm.org
Tue Oct 28 11:37:14 PDT 2025


================
@@ -2550,6 +2554,11 @@ static Value *upgradeNVVMIntrinsicCall(StringRef Name, CallBase *CI,
     Intrinsic::ID IID = (Name == "fabs.ftz.f") ? Intrinsic::nvvm_fabs_ftz
                                                : Intrinsic::nvvm_fabs;
     Rep = Builder.CreateUnaryIntrinsic(IID, CI->getArgOperand(0));
+  } else if (Name.consume_front("ex2.approx.")) {
+    // nvvm.ex2.approx.{f,ftz.f,d,f16x2}
+    Intrinsic::ID IID = Name.starts_with("ftz") ? Intrinsic::nvvm_ex2_approx_ftz
+                                                : Intrinsic::nvvm_ex2_approx;
----------------
AlexMaclean wrote:

I think we're doing this in the backend because if an `llvm.exp2` intrinsic gets there this is the most similar PTX instruction. It's not necessarily a strictly correct lowering as far as I can tell.  I think if we encounter one of the `nvvm.ex2.approx` instructions we're obligated to ensure that the value produced is the exact same as if it were executed on hardware. If we were to convert to the generic instruction more precise constant folding might occur which would be incorrect. 

https://github.com/llvm/llvm-project/pull/165446


More information about the cfe-commits mailing list