[cfe-dev] [llvm-dev] Should isnan be optimized out in fast-math mode?

Mon Sep 13 06:49:47 PDT 2021

Let's weigh the alternatives.

We are discussing two approaches for handling `isnan` and similar functions
in -ffinite-math-only mode:
1. "Old" behavior: "with -ffinite-math-only you are telling that there are
no NaNs", so `isnan` may be optimized to `false`.
2. "New" behavior: with -ffinite-math-only you are telling that the
operands of arithmetic operations are not NaNs but otherwise NaN may be
used. As `isnan` is not an arithmetic operation, it should be preserved.

Advantages of the "old" behavior are:
- " it’s intuitively clear".
- It is close to the GCC current behavior.

Advantages of the "new" behavior are:
- `isnan` is still available to the user, which allows, for instance,
validation of working data or selection between fast and slow path.
- NaN is available and may be used, for instance, as sentinel.
- Consistency between compiler and library implementations, both would
behave similarly.
- In most real cases the "old" behavior can be easily obtained by
redefinition of `isnan`.
- It is free from issues like "what returns
numeric_limits<float>::has_quite_NaN()?".

It is unlikely that "old" behavior gives noticeable performance gain.
Anyway, `isnan` may be redefined to `false` if it actually does.

Intuitive clarity of the "old" way is questionable for users, because it is
not clear why functions like `isnan` silently disappeared or what body
should have specializations of `numeric_limit` methods.

There are cases when checking for NaN is needed even in -ffinite-math-only
mode. To make it, users have to use workarounds like doing integer
arithmetic on float values, which reduce clarity of code, make it
unportable and slower.

Are there any other advantages/disadvantages of these approaches?

Thanks,
--Serge

On Mon, Sep 13, 2021 at 7:00 PM James Y Knight <jyknight at google.com> wrote:

> On Mon, Sep 13, 2021, 2:02 AM Serge Pavlov via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> The working construct is `reinterpret_cast<uint32_t&>(x)`. It however
>> possesses the same drawback, it requires `x` be in memory.
>>
>
> We're getting rather far afield of the thread topic here, but .. that is
> UB, don't do that.
>
> Instead, always memcpy, e.g.
> uint32_t y;
> memcpy(&y, &flo, sizeof(uint32_t));
>
> Or use a wrapper like std::bit_cast or absl::bit_cast (
> https://github.com/abseil/abseil-cpp/blob/cfbf5bf948a2656bda7ddab59d3bcb29595c144c/absl/base/casts.h#L106
> ).
>
> This has effectively no runtime overhead, the compiler is extremely good
> at deleting calls to memcpy when it has a constant smallish size. And
> remember that *every* local variable started out in memory. Only through
> optimizations does the memory location and the loads/stores for every
> access get eliminated.
>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20210913/5fe23580/attachment-0001.html>