[llvm-dev] [cfe-dev] Handling of FP denormal values

Mon Sep 16 17:58:42 PDT 2019

On Mon, Sep 16, 2019 at 7:58 PM Kaylor, Andrew via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> Hi all,
>
>
>
> While reviewing a recent clang documentation change, I became aware of an
> issue with the way that clang is handling FP denormals. There is currently
> some support for variations in the way denormals are handled, but it isn't
> consistent across architectures and generally feels kind of half-baked. I'd
> like to discuss possible solutions to this problem.
>
>
>
> First, there is a clang command line option:
>
>
>
>     -fdenormal-fp-math=<arg>
>
>
>
>     Select which denormal numbers the code is permitted to require.
>
>
>
>     Valid values are: ieee, preserve-sign, and positive-zero, which
>
>     correspond to IEEE 754 denormal numbers, the sign of a flushed-to-zero
>
>     number is preserved in the sign of 0, denormals are flushed to positive
>
>     zero, respectively.
>
>
>
> A quick survey of the code leads me to believe this has no effect for
> targets other than ARM. For X86 targets we may want different options. I'll
> say more about that below. The wording of the documentation is sufficiently
> ambiguous that I’m not entirely certain whether it is intended to control
> the target hardware or just the optimizer.
>
>
>
> In addition, when either -Ofast or -ffast-math is used, we attempt to link
> 'crtfastmath.o' if it can be found. For X86 targets, this object file adds
> a static constructor that sets the DAZ and FTZ bits of the MXCSR register.
> I expect that it has analogous behavior for other architectures when it is
> available. This object file is typically available on Linux systems,
> possibly also with things like MinGW. If it isn't found, the denomral
> control flags will be left in their default state.
>
>
>
> There is also a CUDA-specific option, -f[no-]cuda-flush-denormals-to-zero.
> I don't know how this is implemented, but the documentation says it is
> specific to CUDA device mode.
>
>
>
> Finally, there is an OpenCL-specific option, -cl-denorms-are-zero. Again,
> I don't know how it is implemented.
>
>
>
> So.... I'd like to talk about how we can corral all of this into some
> interface that is consistent (or at least consistently sensible) across
> architectures.
>
>
>
> The problems I see are:
>
>
>
> 1. -fdenormal-fp-math needs to handle all scenarios needed by all
> architectures (or needs to be limited to a common subset).
>
> 2. -fdenormal-fp-math needs to be reconciled with -ffast-math and its
> variants.
>
> 3. -fdenormal-fp-math needs to be consistent about whether or not it
> imposes hardware changes when applicable.
>
> I can only really speak to X86, so I'll say a few words about that to
> start the discussion.
>
>
>
> The current choices for -fdenormal-fp-math are: ieee, preserve-sign, and
> positive-zero. With X86, you get ieee behavior if neither DAZ or FTZ are
> set. If FTZ is set you get 'preserve sign' behavior -- i.e. denormal
> results are flushed to zero and the sign of the result is kept. There is no
> way to get 'positive zero' behavior with X86. At the hardware level, modern
> X86 processors have separate controls for ftz (results are flushed to zero)
> and daz (inputs are flushed to zero before calculations), but I doubt that
> they are used independently often enough to distinguish them at the command
> line option level.
>
>
>
> Also, any X87 instructions that happen to be generated (such as if the
> code contains 'long double' data on Linux) will ignore the ftz and daz
> settings. There are some early Pentium 4 processors that don't support
> 'daz' but I hope we can safely ignore that fact.
>
>
>
> Linking in crtfastmath.o when -Ofast or -ffast-math are used is consistent
> with GCC's behavior. However, it implicitly ignores -fdenormal-fp-math,
> which GCC doesn't have. In most cases if a user sets a fast math option
> they probably also want DAZ and FTZ, but there might be some reason why an
> advanced user would want to treat them separately. This can be done with
> intrinsics, of course, but if we have an option to control it, we should
> respect that option. Also, it is possible to construct fast math behavior
> cafeteria-style (i.e. setting some fast math flags and not others) so we
> should probably have a way to add ftz behaviors a la carte.
>
>
>
> FWIW, ICC sets the FTZ and DAZ flags from a function call that is inserted
> into main depending on the options used to compile the file containing main.
>
>
>
> Trying to go back to the general case, I'd like to solicit information
> about whether other targets have/need different denormal options than are
> described above. Futher, I'd suggest that for any architecture that
> supports FTZ behavior, a well-document default be automatically set when
> fast math is enabled via
>
> -Ofast, -ffast-math, or -funsafe-math-optimizations unless that option is
> turned off by a subsequent -fno-fast-math/-fno-unsafe-math-optimizations
> option or overridden by a subsequent -fdenormal-fp-math option, and if
> -fdenormal-fp-math is used, some code will be emitted to set the relevant
> hardware controls.
>
>
>
> I don't have a strong opinion on whether it is better to emit a static
> constructor or to inject a call into main. The latter seems more
> predictable. I’d like to avoid a dependency on crtfastmath.o either way.
>

I would like to see it called from .init_array (or equivalent) with the
highest init_priority. That way, dynamic initializers get the benefit too.
If we're requesting DAZ+FTZ on the command line, there's no need for a slow
start-up.

Digressing a bit, but I don't like how some implementations of
crtfastmath.o clear all the flags while setting the DAZ+FTZ flags (e.g.
AArch64). Seems unnecessary and makes its position on the link line
significant.

>
>
> Do we need an ftz fast-math flag?
>
>
>
> Are there any other facets to this problem that I've overlooked?
>
>
>
> Thanks,
>
> Andy
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190916/096d99a9/attachment.html>