[llvm] Add constant-folding for unary NVVM intrinsics (PR #141233)

Thu Jul 10 11:20:13 PDT 2025

================
@@ -2548,6 +2653,170 @@ static Constant *ConstantFoldScalarCall1(StringRef Name,
         return ConstantFoldFP(atan, APF, Ty);
       case Intrinsic::sqrt:
         return ConstantFoldFP(sqrt, APF, Ty);
+
+      // NVVM Intrinsics:
+      case Intrinsic::nvvm_ceil_ftz_f:
+      case Intrinsic::nvvm_ceil_f:
+      case Intrinsic::nvvm_ceil_d:
+        return ConstantFoldFP(
+            ceil, APF, Ty,
+            nvvm::GetNVVMDenromMode(
+                nvvm::UnaryMathIntrinsicShouldFTZ(IntrinsicID)));
+
+      case Intrinsic::nvvm_cos_approx_ftz_f:
+      case Intrinsic::nvvm_cos_approx_f:
+        return ConstantFoldFP(
+            cos, APF, Ty,
+            nvvm::GetNVVMDenromMode(
+                nvvm::UnaryMathIntrinsicShouldFTZ(IntrinsicID)));
+
+      case Intrinsic::nvvm_ex2_approx_ftz_f:
+      case Intrinsic::nvvm_ex2_approx_d:
+      case Intrinsic::nvvm_ex2_approx_f:
+        return ConstantFoldFP(
+            exp2, APF, Ty,
+            nvvm::GetNVVMDenromMode(
+                (nvvm::UnaryMathIntrinsicShouldFTZ(IntrinsicID))));
----------------
Artem-B wrote:

> I believe in the vast majority of cases, users won't be requiring bit-exact evaluation of approx intrinsics,

The common failure pattern is that a user may have a test checking for the "golden" values, and that test will be passing/failing depending on the optimization level. They may not even be aware that an imprecise intrinsic was used somewhere under the hood. Granted, those tests usually come with precision/tolerance knobs, and can usually be fixed by relaxing the expected range of results, but I do expect to see some amount of unexpected trouble on the user's end. 

Probably not a showstopper.

> The LLVM versions of these intrinsics like llvm.sin, llvm.exp2 etc. are already folded using math library calls in this way, and can also cause similar precision mismatches

I think the difference here is in the degree of the mismatches. Both LLVM intrinsics and the math library are expected to give you very close results. Maybe not identical, but close (@jhuber6 are there any formal guarantees on that?). However, many GPU intrinsics have higher ULP errors and are not IEEE compliant.
https://docs.nvidia.com/cuda/cuda-c-programming-guide/#mathematical-functions-appendix

Is there a specific limit on how much of a difference is OK? I assume that the "too much of a difference" boundary does exist and that we're not crossing it.

https://github.com/llvm/llvm-project/pull/141233