[cfe-dev] [llvm-dev] Should isnan be optimized out in fast-math mode?

Mon Sep 20 10:13:19 PDT 2021

On Mon, Sep 20, 2021 at 12:40 PM Chris Tetreault via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> You’re confusing implementation details (you have a Godbolt link that
> shows that MSVC just happens to not remove the isnan call) with documented
> behavior (I provided a link to the MSVC docs that shows that no promises
> are made with respect to NaN). The fact is that no compiler (Maybe ICC
> does, I don’t know, I haven’t checked. I bet their docs say something
> similar to MSVC, clang, and GCC though.) guarantees that isnan(x) will not
> be optimized out with fast-math enabled. There is no inconsistency: all the
> compilers document that they are free to optimize as if there were no NaNs,
> and they then do whatever is best for their implementation. If you think
> this is inconsistent, then let me tell you about that time I dereferenced a
> null pointer and it didn’t segfault.
>

+1.

> Now, many people have suggested in this thread that a pragma be added. I
> personally fully support this proposal. I think it’s a very clean solution,
> and any non-trivial portable codebase probably already has a library of
> preprocessor macros that abstract this sort of thing. Do you have a
> concrete reason why a pragma is unsuitable?
>

I think that there are two questions in this thread.
- How should fast-math mode actually behave? [Maybe we're settled on the
"NANs are SNANs and signaling operations produce unspecified values" model.
Gee I hope so.]
- Should switching into/out-of fast-math mode be controlled only by
a TU-level command line option, or should there also be a pragma for it?
(Btw, multiply these questions by the number of different modes we support;
I've consciously been trying to phrase everything in terms of NANs, but
Serge likes to talk about -ffinite-math-only, where not just NANs but also
INF and -INF are verboten. And then there's the -fno-signed-zeros option
<https://gcc.gnu.org/wiki/FloatingPointMath>, which *does not forbid* -0.0,
but does permit it to be treated as a-zero-value-of-unspecified-sign. I
think -ffast-math probably also forbids subnormals... but maybe it just
treats them as either-their-actual-value-or-zero-of-the-appropriate-sign.)

Anyway, should there be a pragma in addition to the TU-level command line
option?:

There must be a command-line option, anyway — I mean, it already exists
(-ffast-math, etc). Pragmas are basically *about* taking some command-line
decision and allowing the decision to be made more granularly. Look at
`#pragma GCC diagnostic ignored "-Wfoo"`, for example; it's expressed in
terms of the command-line option. So if Clang were to support something like
    #pragma GCC optimize("ffast-math")  // cf. #pragma GCC optimize("O2")
that would still be expressed in terms of the command-line option, and
hopefully both the option and the pragma would end up setting the same
internal bits.

However, pragmas are hard to get right. Consider:

    double unoptimized(double x) { return (x + 1) > x; }
    #pragma GCC optimize("ffast-math")
    bool optimized(double x) { return unoptimized(x+1); }
    #pragma GCC optimize("fno-fast-math")
    int main() {
        return optimized(HUGE_VAL);
    }

The compiler would have to think about what it means to inline
`unoptimized` into `optimized`.  The arithmetic in `optimized` produces
INF, but then it's passed to `unoptimized`, which is not marked as
fast-math, so I guess the compiler can't optimize `(x+1) > x` into `true`
in that context?  It's *at least* confusing and subtle for the compiler
vendor to get right; and possibly philosophically confusing as well.
Alternatively, you could forbid inlining between functions with different
optimization levels... but that's *clearly* a terrible idea, right?

And of course some programmer is going to try something dumb like

    #pragma GCC optimize("ffast-math")
    #define REAL_ISNAN(x) std::isnan(x)
    #pragma GCC optimize("fno-fast-math")

which "of course" won't work, but who's going to explain it to them?

Not to mention, if the pragma is active at the top of the TU where some
template or implicitly defaulted special member is defined, but then it's
not active at the point where the template is instantiated or the special
member is implicitly defined... what the heck happens in *that* case? and
who's going to write the StackOverflow answer about it?

Basically, the translation unit is the *natural* unit of... hmm...
translation. There's very little return-on-investment involved in trying to
circumvent that.

–Arthur
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20210920/33393785/attachment.html>