[PATCH] D137811: InstCombine: Perform basic isnan combines on llvm.is.fpclass

Thu Jan 26 09:54:37 PST 2023

kpn added a comment.

In D137811#4081771 <https://reviews.llvm.org/D137811#4081771>, @sepavloff wrote:

> In D137811#4068836 <https://reviews.llvm.org/D137811#4068836>, @kpn wrote:
>
>> In D137811#4066479 <https://reviews.llvm.org/D137811#4066479>, @arsenm wrote:
>>
>>> In D137811#4065720 <https://reviews.llvm.org/D137811#4065720>, @arsenm wrote:
>>>
>>>>> But they have different behavior if `x` is signaling NaN. Although the replacement is proposed for strictfp functions only, inlining may require conversion to a form that uses constrained intrinsics. The resulting code would have different behavior than original.
>>>>
>>>> A transformation that’s valid for !strictfp functions cannot be invalid because of the potential for being inlined into a strictfp function. If this isn’t possible, there is a representational issue the inliner would need to account for.
>>>
>>> Thinking about this more, I think any situation where this could be an issue would be malformed to begin with. You had a piece of code that was expecting exceptions to be enabled calling into code where exceptions were disabled without turning exceptions off
>>
>> If exceptions were turned off, the call was made, and exceptions were turned back on then there would be no correctness issue. Inlining the called function wouldn't change that. The toggling of the exception enablement would be invisible to LLVM. Thus we can't categorically say that a strictfp function calling a !strictfp function is malformed.
>
> On most targets turning off FP exceptions means reading content of FP control register, changing value of mask bits and putting the modified register value back. It is expensive operation and cannot be made invisible to LLVM. Some targets (like RISCV) do not have possibility to mask FP exceptions at all, so at IR level there is no way to turn exception off. Default FP environment supposes that the exceptions are ignored, not disabled. In general case they are raised always.

Ok, true, it takes executing code to change the floating point environment. Yes, there would be inline assembly or a function call to change the FP environment. LLVM would see that code because there would be IR for it. All true.

But LLVM wouldn't know what the inline assembly was doing. It wouldn't recognize the function call and thus wouldn't know what it was doing. LLVM would not know the floating point environment had changed. Rephrased, the change in the floating point environment would, to LLVM, be invisible. That's the "invisible" I was referring to earlier.

>> The constrained intrinsics that specify the default rounding and exceptions disabled are supposed to behave the same as the normal instructions. They're supposed to have the exact same meaning and behavior. So a conversion of isnan(x) to (x == x) for the constrained intrinsics in the default environment is just as correct as the instructions that do the same thing but aren't constrained intrinsics. Thus inlining of a !strictfp function into a strictfp function with the conversion of FP instructions into constrained intrinsics can't introduce any new correctness issues.
>
> No matter what exception behavior is specified in constrained intrinsics call, processor instruction will raise FP exceptions. If bits in FP environment specifies trap on `Invalid` exception, the check `x != x` may result in crash, if `x` is signaling NaN. The check `isnan(x)` in this case is safe. So replacement `is_fpclass`->`fcmp` changes semantics and cannot be used here.
>
> Particular cores may implement `is_fpclass` as CMP instruction, it is made by default lowering, but making such replacement in IR is incorrect.

Be careful of how IEEE 754 uses the same terminology that a Unix person uses, but the words have different meanings. I'm going to use the term "754 trap" to mean a trap in the IEEE-754 document's use of the term. I'm going to say "Unix trap" when a trap involves transferring of control to the OS.

An FP instruction can "754 trap" but the result may just be changing the FP status bits in the environment to record that something happened. And if we are not using the constrained intrinsics, or we are using them with exceptions "ignore" and rounding "roundtoeven", then we are assumed to not be accessing the FP status bits.

A CPU is allowed to always "Unix trap" and transfer control to the OS. I think you are saying that RISCV does this. But the OS is allowed to fix things up so that the application doesn't observe the CPU's trap. Indeed, in the default FP environment the OS is _required_ to hide the CPU's "Unix trap" from the application. From the application's point of view this is the same as a CPU not doing a "Unix trap" at all. In this case we can treat it the same as a CPU that doesn't trap in the default FP environment.

It's true that a blind replacement of is_fpclass with fcmp would be incorrect _if_ the floating point environment is not the default environment. But if we are in the default FP environment we can assume that any CPU trap will be hidden from us by the OS and therefore is not a part of our discussion of IR correctness.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137811/new/

https://reviews.llvm.org/D137811