[llvm-dev] RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags

Wed Nov 16 08:02:50 PST 2016

I know Hal and Mehdi are already aware, but for the wider audience and back
reference, we recently had some discussions about the broken state of
fast-math in:
https://reviews.llvm.org/D24816 ([Target] move reciprocal estimate settings
from TargetOptions to TargetLowering)
https://reviews.llvm.org/D24815 ([clang] make reciprocal estimate codegen a
function attribute)

Starting by fixing the IR flags is likely the best first step because if we
fix the backend first, we may expose more bugs-on-top-of-bugs as described
here:
https://llvm.org/bugs/show_bug.cgi?id=27372#c5

But we may need a bigger change than just removing the umbrella effect of
'fast':
When I started looking at how we could move the reciprocal-estimate setting
(this is different than 'arcp') to instructions, I noted that the existing
FMF uses Value's SubclassOptionalData.

I'm guessing this is for maximum performance, but SubclassOptionalData is
exactly 7 bits, and we've already burned 5 of them. So if we need more
refinement than 'fast' / 'aggr' to express what's possible via the
front-end flags, we either need to expand SubclassOptionalData, add a
custom field for FMF, or move FMF to metadata or attributes. I'm assuming
the first 2 options are much larger undertakings than the third.

For reciprocal-estimates, using the existing FMF/SubclassOptionalData isn't
a viable solution because there's no way to express all of the front-end
info (number of refinement steps, etc) in 2 bits. So what about metadata?
We already have an instruction-level metadata type called "MD_fpmath".
AFAICT, it's currently only used to indicate a reduced accuracy operation
for AMDGPU. So I was hoping to just add an option to that metadata type for
reciprocal-estimates.

I realize that metadata is optional (ie, it can be dropped without
affecting correctness). I think that's fine for reciprocal-estimates, but
I'm not sure. Are -ffast-math and friends optional optimization flags? Or
do we consider those non-optional modes of execution? If they are not
optional, then should we have instruction-level attributes for FMF? This
possibility came up when we enabled FMF on calls ( https://reviews.llvm.org/
D14707 ).

Apologies for raising too many issues at once, but consider that another
potential subclass of relaxed-FP-math was recently discussed here:
https://reviews.llvm.org/D26602

Is a "distributive" transform different than an "associative" transform?

On Wed, Nov 16, 2016 at 12:59 AM, Hal Finkel via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

>
> ------------------------------
>
> *From: *"Mehdi Amini via llvm-dev" <llvm-dev at lists.llvm.org>
> *To: *"Warren Ristow" <warren.ristow at sony.com>
> *Cc: *llvm-dev at lists.llvm.org
> *Sent: *Tuesday, November 15, 2016 11:10:48 PM
> *Subject: *Re: [llvm-dev] RFC: Consider changing the semantics of 'fast'
> flag implying all fast-math-flags
>
>
> Hi,
>
> On Nov 15, 2016, at 5:15 PM, Ristow, Warren via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hi all,
>
> This is about https://reviews.llvm.org/D26708
>
> Currently when the command-line switch '-ffast-math' is specified, the
> IR-level fast-math-flag 'fast' gets attached to appropriate FP math
> instructions.  That flag acts as an "umbrella" to implicitly turn on all
> the
> other fast-math-flags ('nnan', 'ninf', 'nsz' and 'arcp'):
>
> http://llvm.org/docs/LangRef.html#fast-math-flags
>
> This approach has the shortcoming that when there is a desire to disable
> one
> of these fast-math-flags, if the 'fast' flag remains, it implicitly
> re-enables the one being disabled.  For example, compiling this test-case:
>
>     extern void use(float x, float y);
>     void test(float a, float b, float c) {
>       float q1 = a / c;
>       float q2 = b / c;
>       use(q1, q2);
>     }
>
> at '-O2 -ffast-math' does a reciprocal-transformation, so only one division
> is done (as desired with fast-math).  Compiling it with:
>
>   -O2 -ffast-math -fno-reciprocal-math
>
> should disable the reciprocal transformations (the flag 'arcp'), but leave
> all the other fast-math transformations enabled.  The current
> implementation
> doesn't do that, since the 'fast' IR-level flag still gets set.
>
> Motivation of this discussion: https://llvm.org/b
> ugs/show_bug.cgi?id=27372#c2
>
> As an aside, when '-ffast-math' is specified on the command-line, the
> following six switches are all passed to cc1:
>
>     -menable-no-infs
>     -menable-no-nans
>     -fno-signed-zeros
>     -freciprocal-math
>     -fno-trapping-math
>     -ffp-contract=fast
>
> and '-ffast-math' itself is also passed cc1 (the act of passing
> '-ffast-math'
> to cc1 results in the macro '__FAST_MATH__' being defined).  When (for
> example) '-fno-reciprocal-math' is passed in addition to '-ffast-math',
> then
> '-freciprocal-math' is no longer passed to cc1 (and the other five listed
> above still are passed, along with '-ffast-math' still being passed).  It
> seems like the intention was that these individual switches were to enable
> the individual floating-point transformations (and so the lack of any of
> those switches would suppress the relevant transformations), but the
> '-ffast-math' "umbrella" is over-riding the attempted suppression.
>
>
> Sure, this looks like a bug, disable an individual fast-math flag on the
> command line should be possible and override a prior -ffast-math (usually
> the last one on the command line “wins”/override).
>
> The Cfe-dev mailing list would be more appropriate to discuss the behavior
> of clang command line flags though.
>
>
>
> The change proposed at https://reviews.llvm.org/D26708 deals with this
> issue
>
>
> This patch seems to modify on LLVM, it does not deal at all with the issue
> you describe above.
> I don’t see why the issue with the clang command line flags need to be
> dealt with at the LLVM level.
>
>
> just for the reciprocal-transformation case, but it changes the semantics
> of
> the 'fast' IR-level flag so that it no longer implies all the others.
>
>
> The starting point for any change is: http://llvm.org/docs/LangRef.h
> tml#fast-math-flags
> You would need to write a new definition for what “fast” would mean.
>
> However I don’t need anything need to be changed here to address the
> use-case you want to fix.
>
> I suspect that we want to start by getting rid of 'fast' on the IR level
> and replacing it with individual flags for the various optimization classes
> - Do we have only allowing reassociation and libm optimizations? Then we
> can readjust the Clang flags in a straightforward way.
>
>  -Hal
>
>
>
> With
> that proposed approach, rather than an "umbrella" flag such as 'fast' being
> checked in the back-end (along with an individual flag like 'arcp'), just
> checking the individual flag ('arcp') would be done.
>
>
> There is already no need to check the “fast” *and* arcp flag, if a
> transformation is about reciprocal, then you only need to check arcp (fast
> implies arcp, checking for fast would be redundant).
>
> Be careful also that the fast-math flags are mainly an IR level
> definition, the backend only inherited these per instruction flag very
> recently. It has been entirely converted to use these, and it still uses
> a global flag in some places.
> The line you’re touching in your patch for instance is about this legacy:
>
>   if (!UnsafeMath && !Flags->hasAllowReciprocal())
>
> The first flag is the global “fast-math” mode on the backend, which is not
> as fine grain as the per-instruction model.
> The second flag is the “per instruction” flag, which is the model we aim
> at.
>
> We should get rid of the “global” UnsafeMath in the backend, but that does
> not require any change to the IR or the individual fast-math flags.
>
>
> Any fast-math-related
> transformation that doesn't have an individual flag (e.g., re-association
> currently doesn't), should eventually have an individual flag defined for
> it, and then that individual flag should be checked.
>
> What do people think?
>
>
> I think these are valuable problems to solve, but you should tackle them
> piece by piece:
>
> 1) the clang part of overriding the individual FMF and emitting the right
> IR is the first thing to fix.
> 2) the backend is still using the global UnsafeFPMath and it should be
> killed.
>
> Hope this makes sense.
>
> —
> Mehdi
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161116/12f7b288/attachment-0001.html>