[llvm] [ConstantFolding] Add flag to disable call folding (PR #140270)

Fri May 23 08:26:22 PDT 2025

LewisCrawford wrote:

The context for this flag, is that I want to add constant-folding support for all these NVVM math intrinsics here: https://github.com/llvm/llvm-project/pull/141233 

>From the list of intrinsics supported there, several might end up with slightly different results from the device-side version of the code depending on what the host-side compiler's math library does to constant fold them.
- nvvm.cos.approx.*
- nvvm.ex2.approx.*
- nvvm.lg2.approx.*
- nvvm.rcp.*
- nvvm.rsqrt.approx.*
- nvvm.sin.approx.*
- nvvm.sqrt.f
- nvvm.sqrt.rn.*
- nvvm.sqrt.approx.*

There have also been other discussions about folding FP call instructions (e.g. here: https://discourse.llvm.org/t/fp-constant-folding-of-floating-point-operations/73138 ).

> It does give us a bit more control and will get some computations done on the GPU instead of LLVM-computed result, but it's neither here nor there and leaves us quite far from "let GPU do all FP calculations".

The aim here is not "let GPU do all FP calculations", but just to handle the narrower cases of math library function-calls and intrinsics where potential issues precision differences are more likely to be visible. A more general "disable all FP folding" or even "disable all constant-folding" flag might also have some value, but I think this narrower flag is all that would be needed to cover the potential problems users expecting bit-accurate results could face from the folding in https://github.com/llvm/llvm-project/pull/141233 (or are already facing from folding similar LLVM sin/cos intrinsics or libcalls).

> If LLVM happens to inline some trigonometric function with constant argument, it may be able to fold it all completely bypassing this option.

I LLVM is able to inline the function, then it already has the exact implementation available, so the problem of using a different implementation for a function like sin to fold an an intrinsic like llvm.sine or nvvm.sin.approx would not occur. This patch is more for cases where functions are either intrinsics, or folded by function-name as a libcall, rather than having a fully specified implementation available to inline and fold that way.

> Things get further complicated by the fact that NVPTX often passes reduced precision FP types as opaque integers, and we presumably want to avoid folding those, too.

Currently, we don't have constant-folding support for those operations. The f16 versions of the above intrinsics, like `nvvm.ex2.approx.f16x2` are implemented with real floating-point types rather than ints, so should be covered by this patch. The intrinsics involving smaller FP types like `nvvm_e5m2x2_to_f16x2_rn`, `nvvm_e2m3x2_to_f16x2_rn_relu` etc all look like they are conversion intrinsics, rather than more complex math-library-like intrinsics. These should have well-defined conversion semantics, so the implementation of the constant-folding on the host-side should be bitwise identical to the device-side if we ever add an implementation for this, so it will not cause the sort of problems this `disable-fp-call-folding` flag is intended to solve. Can you give any examples of an intrinsic using an int to represent a small FP type that would potentially cause precision issues between host and device-side execution if it was folded?

> That said, as a debug flag for disabling folding of some functions in general it may be useful. Working around function folding in tests is somewhat common.

This is a good point, which I hadn't considered as a use-case before. However, if we narrow this flag to only cover specific functions, it seems like it will become less useful for this, as users will need to carefully check which function are/are not covered by it.

> If we do want to apply the no-folding to a subset of functions, we may need to find a more precise way to determine that set.

What do you view as the benefit from making the subset narrower? I agree, that the current implementation is broader than it needs to be, and that something like including ex2 but excluding fabs would stop the flag from blocking folding that would be precise. However, I do not expect this flag to be used by most people. When it is used, the fact it applies to all functions with FP inputs or outputs makes its scope easy to understand from the flag-name/description without checking the LLVM source-code for a precise list of functions. I expect it to be useful to test with vs without the flag to spot cases where host vs device mismatch occurs, and then users can another method to avoid constant-folding (e.g. passing a value as a kernel parameter or via a load from memory) if they determine a specific point where this matters in their code, and the performance is too slow with the flag enabled.

> For the functions we don't want to fold during compilation we may need to have a way to mark them explicitly, perhaps via an attribute on the function itself, or on the caller function.

I don't think the caller function would work, as you'd need to block inlining for those functions to preserve the attribute, which would potentially have even more of a perf impact than just not folding a few instructions (and would add complexity). Adding an attribute to the function, e.g. specifying something like MayFoldInexactly in the intrinsic definitions for functions like nvvm.ex2.approx.* (or even the inverse - adding FoldsExactly to fabs, fmax etc) might be a decent way to implement this if there is real value to narrowing this to a small subset of functions. I'm not 100% sure how this would work for LibCalls, but the NVPTX backend does not use LibCalls, so that is not strictly necessary for the use-cases I need this flag for.

However, I think this approach could become error-prone, as it would be easy to miss adding this attribute in a case where it would be needed. It also makes the semantics of a flag like this harder to understand for users without reading the implementation for which functions it includes. There may also be cases where the functions are almost exact, but NaN payloads or FTZ semantics might be slightly different depending on the host vs device, or library-version used.

> Another possibility is to allow specifying a list of functions or patterns to match, and apply the flag only to the matching functions. This way it will be up to the user to specify which function calls they want to preserve.

This seems very flexible and powerful for the user. However, I don't think there are enough people who would need this functionality to make it worth implementing all the additional complexity required to parse and check this list. Cases like intrinsics that are auto-upgraded (e.g. `nvvm_fabs_f` gets upgraded into `nvvm_fabs`) or transformed (e.g. NVPTXTargetTransformInfo.cpp turns `nvvm_fmax_f` into `llvm.maxnum` currently), or optimized in InstCombine somehow, would complicate this sort of mechanism too. Users would need to know all variants of the intrinsic that input might get turned into in order for this to work reliably.

Currently, I still think the simple approach in this patch is best. It makes it easy for us as maintainers, as we do not need to evaluate individual libcalls/intrinsics for whether they need included/excluded from this flag, and makes it easy for users as they do not need to check exactly which calls this covers. It's still a fairly blunt instrument, so I don't think it will be useful for users that would need this in production for performance-critical code, but I think it is broad enough to be useful as a debugging tool that can be used to help find precision issues, and then work around them in other ways. There may be use-cases for more general flags to disable all folding or all FP folding, or more specific flags that control specific function folding rules, but I think the current implementation is a decent middle-ground between those two extremes, and is simple enough to be useful without providing an additional maintenance burden.

https://github.com/llvm/llvm-project/pull/140270