[PATCH] D137811: InstCombine: Perform basic isnan combines on llvm.is.fpclass

Thu Jan 26 16:03:46 PST 2023

jcranmer-intel added a comment.

In D137811#4081771 <https://reviews.llvm.org/D137811#4081771>, @sepavloff wrote:

> In D137811#4068836 <https://reviews.llvm.org/D137811#4068836>, @kpn wrote:
>
>> If exceptions were turned off, the call was made, and exceptions were turned back on then there would be no correctness issue. Inlining the called function wouldn't change that. The toggling of the exception enablement would be invisible to LLVM. Thus we can't categorically say that a strictfp function calling a !strictfp function is malformed.
>
> On most targets turning off FP exceptions means reading content of FP control register, changing value of mask bits and putting the modified register value back. It is expensive operation and cannot be made invisible to LLVM. Some targets (like RISCV) do not have possibility to mask FP exceptions at all, so at IR level there is no way to turn exception off. Default FP environment supposes that the exceptions are ignored, not disabled. In general case they are raised always.

Non-`strictfp` functions are assumed to be in the default FP environment, which implies there's some form of undefined behavior if you call a non-`strictfp` function with a non-default FP environment. The precise, formal semantics that effect this rule is of course underdefined,

This is how we define default FP environment in the LangRef today:

> The default LLVM floating-point environment assumes that floating-point instructions do not have side effects. Results assume the round-to-nearest rounding mode. No floating-point exception state is maintained in this environment. Therefore, there is no attempt to create or preserve invalid operation (SNaN) or division-by-zero exceptions.

>From this definition, it seems clear to me that we are legally allowed to insert instructions that would cause FP exceptions in non-`strictfp` functions.

To rephrase the rules (as I understand them) in somewhat more precise terms, if you call a non-`strictfp` function with a non-default FP environment, the values of any FP operation are unspecified, floating-point sticky bits have unspecified values, and if the dynamic FP environment is set to generate hardware traps, the act of *calling* the function is UB (that is, we are permitted to introduce FP operations that may trap in code-paths where none existed). In this understanding, converting a non-`strictfp` function into a `strictfp` function with bare FP instructions replaced with constrained intrinsics and `round.tonearest` and `fpexcept.ignore` metadata is a valid optimization, but one that narrows the possible semantics (since non-`strictfp` may introduce FP operations that `strictfp` may not).

In D137811#4083368 <https://reviews.llvm.org/D137811#4083368>, @kpn wrote:

> Be careful of how IEEE 754 uses the same terminology that a Unix person uses, but the words have different meanings. I'm going to use the term "754 trap" to mean a trap in the IEEE-754 document's use of the term. I'm going to say "Unix trap" when a trap involves transferring of control to the OS.
>
> An FP instruction can "754 trap" but the result may just be changing the FP status bits in the environment to record that something happened. And if we are not using the constrained intrinsics, or we are using them with exceptions "ignore" and rounding "roundtoeven", then we are assumed to not be accessing the FP status bits.
>
> A CPU is allowed to always "Unix trap" and transfer control to the OS. I think you are saying that RISCV does this. But the OS is allowed to fix things up so that the application doesn't observe the CPU's trap. Indeed, in the default FP environment the OS is _required_ to hide the CPU's "Unix trap" from the application. From the application's point of view this is the same as a CPU not doing a "Unix trap" at all. In this case we can treat it the same as a CPU that doesn't trap in the default FP environment.

I went ahead and looked at the RISC-V specification to see what it does on FP exceptions. The RISC-V instructions only model 754 traps as sticky bits in the `fcsr`, and there's an explicit note that it provides no way to convert a 754 trap to a Unix trap (to use your terminology).

FWIW, C itself requires that implementations provide FP exception handling as sticky-bits (this is the IEEE 754 default exception handling), with the existence of a Unix trapping mode being an allowable extension (which it declines to specify).

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137811/new/

https://reviews.llvm.org/D137811