[llvm] [ConstantFolding] Add flag to disable call folding (PR #140270)

Fri May 30 03:22:59 PDT 2025

LewisCrawford wrote:

I think to cover all conceivable use-cases we'd need flags for:

1: Disable all folding
2: Disable all FP folding
3: Disable FP call folding
4: Disable FP call folding implemented via a call to the host library
5: Disable folding only specific intrinsics/libcalls marked as potentially inexact between implementations
6: Disable folding only for specific intrinsics/libcalls named via a command-line arg

For the specific use-case people have been asking me for this, (3) seems the best balance, but I'd be happy for any of the others to be added in addition to cover more general or more narrow use-cases.

I agree, it does seem a bit silly to disable exact implementations. However, one example that has been brought up to me is that of fabs. The NVVM version of fabs is not necessarily bit-exact, as it may canonicalize NaNs. The PTX spec states:
> For abs.f32, NaN input yields unspecified NaN.
> Future implementations may comply with the IEEE 754 standard by preserving payload and modifying only the sign bit.

So it is technically legal for us to fold with only changing the sign-bit, since the NaN output is unspecified. However, this might produce a different result to the hardware if it chooses to use a canonical value for NaN instead here (and different architectures may technically produce different NaN values). 

Also, we could choose to fold this using either a libcall to fabs, or with APFloat's clearSign function. In 2019, LLVM's target-independent abs intrinsic was switched (along with several others) from libcall to APFloat implementation here: https://reviews.llvm.org/D67459 . We want this flag to be slightly more general than just not folding host libcalls (4), because that implementation can change (and has changed in that review), and some cases with bit-exact implementations via APFloat can still produce differing results on NVPTX hardware in cases where the spec allows flexibility (e.g. around NaN canonicalization).

rsqrt is another example where the PTX spec allows flexibility:
> The maximum relative error for rsqrt.f32 over the entire positive finite floating-point range is 2-22.9
So it it technically possible that the host-side folding may be more precise than the device-side implementation without violating the spec (and e.g. x86 may be different from aarch64 on the host-side, and sm60 may be different from sm100 on the device-side if they happen to implement this slightly differently).

So (3) seems the best balance between allowing a little FP math to be folded (regular adds/muls etc), while disabling calls to other functions consistently without requiring end-users to know implementation details about whether libcalls are used in the implementation (which may change between versions), or whether the specific intrinsics get auto-upgraded or transformed into other intrinsics later on.

https://github.com/llvm/llvm-project/pull/140270