[cfe-dev] [llvm-dev] Should isnan be optimized out in fast-math mode?

Mon Sep 20 10:45:43 PDT 2021

On Mon, Sep 20, 2021 at 10:13 AM Arthur O'Dwyer
<arthur.j.odwyer at gmail.com> wrote:
>
> On Mon, Sep 20, 2021 at 12:40 PM Chris Tetreault via cfe-dev <cfe-dev at lists.llvm.org> wrote:
>>
>> You’re confusing implementation details (you have a Godbolt link that shows that MSVC just happens to not remove the isnan call) with documented behavior (I provided a link to the MSVC docs that shows that no promises are made with respect to NaN). The fact is that no compiler (Maybe ICC does, I don’t know, I haven’t checked. I bet their docs say something similar to MSVC, clang, and GCC though.) guarantees that isnan(x) will not be optimized out with fast-math enabled. There is no inconsistency: all the compilers document that they are free to optimize as if there were no NaNs, and they then do whatever is best for their implementation. If you think this is inconsistent, then let me tell you about that time I dereferenced a null pointer and it didn’t segfault.
>
>
> +1.
>
>>
>> Now, many people have suggested in this thread that a pragma be added. I personally fully support this proposal. I think it’s a very clean solution, and any non-trivial portable codebase probably already has a library of preprocessor macros that abstract this sort of thing. Do you have a concrete reason why a pragma is unsuitable?
>
>
> I think that there are two questions in this thread.
> - How should fast-math mode actually behave? [Maybe we're settled on the "NANs are SNANs and signaling operations produce unspecified values" model. Gee I hope so.]
> - Should switching into/out-of fast-math mode be controlled only by a TU-level command line option, or should there also be a pragma for it?
> (Btw, multiply these questions by the number of different modes we support; I've consciously been trying to phrase everything in terms of NANs, but Serge likes to talk about -ffinite-math-only, where not just NANs but also INF and -INF are verboten. And then there's the -fno-signed-zeros option, which does not forbid -0.0, but does permit it to be treated as a-zero-value-of-unspecified-sign. I think -ffast-math probably also forbids subnormals... but maybe it just treats them as either-their-actual-value-or-zero-of-the-appropriate-sign.)
>
> Anyway, should there be a pragma in addition to the TU-level command line option?:
>
> There must be a command-line option, anyway — I mean, it already exists (-ffast-math, etc). Pragmas are basically about taking some command-line decision and allowing the decision to be made more granularly. Look at `#pragma GCC diagnostic ignored "-Wfoo"`, for example; it's expressed in terms of the command-line option. So if Clang were to support something like
>     #pragma GCC optimize("ffast-math")  // cf. #pragma GCC optimize("O2")
> that would still be expressed in terms of the command-line option, and hopefully both the option and the pragma would end up setting the same internal bits.
>
> However, pragmas are hard to get right. Consider:
>
>     double unoptimized(double x) { return (x + 1) > x; }
>     #pragma GCC optimize("ffast-math")
>     bool optimized(double x) { return unoptimized(x+1); }
>     #pragma GCC optimize("fno-fast-math")
>     int main() {
>         return optimized(HUGE_VAL);
>     }
>
> The compiler would have to think about what it means to inline `unoptimized` into `optimized`.  The arithmetic in `optimized` produces INF, but then it's passed to `unoptimized`, which is not marked as fast-math, so I guess the compiler can't optimize `(x+1) > x` into `true` in that context?  It's at least confusing and subtle for the compiler vendor to get right; and possibly philosophically confusing as well.
> Alternatively, you could forbid inlining between functions with different optimization levels... but that's clearly a terrible idea, right?

That does not seem like a terrible idea: we already limit inline when
function attributes mismatch in this way.
But here you don't even need inlining if the pragma is used only for a
sequence of statements inside a function. Fortunately we handle
fast-math with individual instruction flags, so you could imagine:

#pragma GCC optimize("ffast-math")
x = a + b;
#pragma GCC optimize("fno-fast-math")
if (isnan(x)) {
   ...
}

which would tag the fadd with the fast flag but not the isnan.

In practice you'd write this way though:

x = a + b; // default specified on the command line
#pragma GCC push_options
#pragma push GCC optimize("fno-fast-math")
if (isnan(x)) {
   ...
}
#pragma GCC pop_options

>
> And of course some programmer is going to try something dumb like
>
>     #pragma GCC optimize("ffast-math")
>     #define REAL_ISNAN(x) std::isnan(x)
>     #pragma GCC optimize("fno-fast-math")
>
> which "of course" won't work, but who's going to explain it to them?
>
> Not to mention, if the pragma is active at the top of the TU where some template or implicitly defaulted special member is defined, but then it's not active at the point where the template is instantiated or the special member is implicitly defined... what the heck happens in that case? and who's going to write the StackOverflow answer about it?
>
> Basically, the translation unit is the natural unit of... hmm... translation. There's very little return-on-investment involved in trying to circumvent that.
>
> –Arthur