[llvm] Add constant-folding for unary NVVM intrinsics (PR #141233)

Thu Jul 10 04:49:31 PDT 2025

================
@@ -2548,6 +2653,170 @@ static Constant *ConstantFoldScalarCall1(StringRef Name,
         return ConstantFoldFP(atan, APF, Ty);
       case Intrinsic::sqrt:
         return ConstantFoldFP(sqrt, APF, Ty);
+
+      // NVVM Intrinsics:
+      case Intrinsic::nvvm_ceil_ftz_f:
+      case Intrinsic::nvvm_ceil_f:
+      case Intrinsic::nvvm_ceil_d:
+        return ConstantFoldFP(
+            ceil, APF, Ty,
+            nvvm::GetNVVMDenromMode(
+                nvvm::UnaryMathIntrinsicShouldFTZ(IntrinsicID)));
+
+      case Intrinsic::nvvm_cos_approx_ftz_f:
+      case Intrinsic::nvvm_cos_approx_f:
+        return ConstantFoldFP(
+            cos, APF, Ty,
+            nvvm::GetNVVMDenromMode(
+                nvvm::UnaryMathIntrinsicShouldFTZ(IntrinsicID)));
+
+      case Intrinsic::nvvm_ex2_approx_ftz_f:
+      case Intrinsic::nvvm_ex2_approx_d:
+      case Intrinsic::nvvm_ex2_approx_f:
+        return ConstantFoldFP(
+            exp2, APF, Ty,
+            nvvm::GetNVVMDenromMode(
+                (nvvm::UnaryMathIntrinsicShouldFTZ(IntrinsicID))));
----------------
LewisCrawford wrote:

I believe it is ok to do so, and I added the flag `-disable-fp-call-folding` earlier in https://github.com/llvm/llvm-project/pull/140270 sepcifically to allow users who need the bit-exact evaluations of these functions to match. Also, StrictFP provides another way to prevent these from being folded if necessary.

I believe in the vast majority of cases, users won't be requiring bit-exact evaluation of `approx` intrinsics, in the same way that users should not rely on specific fma fusion behaviour if -fast-math is enabled, and should use the non-approx versions of these intrinsics if they require exact results.

The LLVM versions of these intrinsics like llvm.sin, llvm.exp2 etc. are already folded using math library calls in this way, and can also cause similar precision mismatches e.g. in cross-compilation scenarios where the device the code was compiled on has a different architecture from the target the code executes on (or even a different libm implementation). 

https://github.com/llvm/llvm-project/pull/141233