[llvm-dev] RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags

Thu Nov 17 13:09:05 PST 2016

----- Original Message -----
> From: "Nicolai Hähnle via llvm-dev" <llvm-dev at lists.llvm.org>
> To: "Andrew Kaylor" <andrew.kaylor at intel.com>, "Warren Ristow" <warren.ristow at sony.com>, "mehdi amini"
> <mehdi.amini at apple.com>
> Cc: llvm-dev at lists.llvm.org
> Sent: Thursday, November 17, 2016 2:35:47 PM
> Subject: Re: [llvm-dev] RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags
> 
> On 17.11.2016 19:54, Kaylor, Andrew wrote:
> >>All that said, I think we (the company I work for, Sony) will have
> >>to implement support
> >>for these switches.  It comes down to GCC has these switches (e.g.,
> >>-fno-reciprocal-math
> >>and -fno-associative-math), and they do suppress the
> >>transformations for our customers.
> >>They switch to Clang/LLVM, they use the same switches, and it
> >>doesn't "work".  So as a
> >>practical matter, I think we will support them.  Whether the LLVM
> >>community in general
> >>feels that that's required, is another question.  Until for your
> >>recent comments here, and
> >>Nicolai's comments above, I would have thought the answer was
> >>clearly yes.  But maybe
> >>that's not the case.
> >
> > I think this is a very good point.  You (Sony) are not the only
> > ones who
> > are concerned with GCC-command line compatibility.  It definitely
> > should
> > hold some weight.  Given that this is something we could do with
> > just a
> > little more effort, I’m not sure mere simplicity is enough reason
> > not to
> > do it.
> 
> Right. I'm not fundamentally opposed to having these flags, as long
> as
> we can agree that the *only* reason for having them is slightly
> better
> GCC compatibility. The "slightly better" is important, too, because
> promising real compatibility with any kind of fast math-type setting
> would simply be a lie.
> 
> So (to answer Mehdi's question in a different part of the thread),
> I'd
> consider keeping arcp around a wart, but an acceptable one. I'm fine
> with: IR 'fast' becomes IR 'reassociation' (or similar; algebraically
> correct transforms that may change rounding), and reciprocal math
> becomes "this thing that should logically be enabled by
> 'reassociation',
> but instead requires 'arcp' for GCC-'compatibility' reasons".
> 
> And to be clear, 'reassociation' should _not_ by itself allow
> transforms
> like X * (Y + 1) --> X * Y + X which can change the NaN-ness of the
> result when Infs are among the arguments. That's what 'reassociation'
> +
> 'ninf' is for.
> 
> 
> > Also, on a slight tangent...
> >
> >
> >
> >>> I'd be really curious to know if there is anybody who really
> >>> needs arcp
> >>> without fp-contract=fast or vice versa, or who needs both of
> >>> these but
> >>> not the X*log2(0.5*Y) transform you mentioned, and so on.[1]
> >
> > I just wanted to mention that fp-contract relates to things like
> > FMA and
> > shouldn’t be confused with fast-math.
> 
> It's conceptually the same type of thing though, isn't it? At least
> fp-contract=fast, which means "use FMA even when it changes floating
> point results (due to different rounding)".

Yes and no. For one thing, *all* FP contraction modes (i.e. FMA formation), including the one standardized by C, can change results doing to different intermediate rounding properties. What makes it different from other "fast math" settings is that:

 1. The result from individual contracted operations is always more accurate than the original operations, not less.
 2. The FMA operation, which is the only combination that FP contraction enables, is a specific combination that is defined by the IEEE standard.

For these reasons, we differentiate it from the others, and users also consider it qualitatively different from other FP optimization flags.

 -Hal

> This is kind of like the
> 'fast' flag, which means "do all sorts of transformations even when
> they
> change floating point results (due to different rounding)". I don't
> know
> whether clang -ffast-math enables fp-contract=fast, but I'd say that
> in
> a clean, from-scratch design, fp-contract=fast shouldn't be a
> separate flag.
> 
> Cheers,
> Nicolai
> 
> 
> >
> >
> >
> > -Andy
> >
> >
> >
> >
> >
> > *From:*Ristow, Warren [mailto:warren.ristow at sony.com]
> > *Sent:* Thursday, November 17, 2016 12:51 AM
> > *To:* mehdi.amini at apple.com
> > *Cc:* Kaylor, Andrew <andrew.kaylor at intel.com>;
> > llvm-dev at lists.llvm.org;
> > Nicolai Hähnle <nhaehnle at gmail.com>
> > *Subject:* RE: [llvm-dev] RFC: Consider changing the semantics of
> > 'fast'
> > flag implying all fast-math-flags
> >
> >
> >
> > Those are all good points.  Your reassociation point in the context
> > of
> > inlining is particularly interesting.
> >
> >
> >
> > FWIW, we also have a case where a customer wants
> > '-fno-associative-math'
> > to suppress reassociation under '-ffastmath'.  It would take me a
> > while
> > to find the specifics of the issue, but it was (if my memory is
> > right)
> > more of a real use-case.  (That is to say, the code that was
> > "failing"
> > due to reassociation didn't have an obvious fix like the reciprocal
> > situation, here, other than to turn off fast-math.)  In fact, the
> > request to suppress reassociation was the motivation for creating
> > PR27372 in the first place (which eventually fed into this thread).
> >  I
> > have to say that on the reassociation point, my concern is that to
> > really suppress that, we will have to suppress so much, that there
> > will
> > hardly be any point in using -ffast-math.
> >
> >
> >
> > I'd say your comments here are very similar to what Nicolai said in
> > another subthread of this discussion:
> >
> >
> >
> >>> I'd be really curious to know if there is anybody who really
> >>> needs arcp
> >
> >>> without fp-contract=fast or vice versa, or who needs both of
> >>> these but
> >
> >>> not the X*log2(0.5*Y) transform you mentioned, and so on.[1]
> >
> >>> ...
> >
> >>> [1] One case I _can_ think of (and which may have been a reason
> >>> for the
> >
> >>> proliferation of flags in the first place) is somebody who
> >>> enables fast
> >
> >>> math, but then doesn't want their results to change when they
> >>> update the
> >
> >>> compiler and get a new set of optimizations. But IMO that's a use
> >>> case
> >
> >>> that should be explicitly rejected.
> >
> >
> >
> > I think those are all really good points, and an argument can be
> > made
> > that when -ffast-math gives you results you don't want, then you
> > just
> > have to turn it off.  Essentially, the user can't "have his cake
> > and eat
> > it too".
> >
> >
> >
> > All that said, I think we (the company I work for, Sony) will have
> > to
> > implement support for these switches.  It comes down to GCC has
> > these
> > switches (e.g., -fno-reciprocal-math and -fno-associative-math),
> > and
> > they do suppress the transformations for our customers.  They
> > switch to
> > Clang/LLVM, they use the same switches, and it doesn't "work".  So
> > as a
> > practical matter, I think we will support them.  Whether the LLVM
> > community in general feels that that's required, is another
> > question.
> > Until for your recent comments here, and Nicolai's comments above,
> > I
> > would have thought the answer was clearly yes.  But maybe that's
> > not the
> > case.
> >
> >
> >
> > In summary, irrespective of any (subjective?) assessment of how
> > legitimate a particular use-case is, do we want switches like:
> >
> >
> >
> >     -ffast-math -fno-reciprocal-math
> >
> >      -ffast-math -fno-associative-math
> >
> >
> >
> > to work?
> >
> >
> >
> > For me, the answer is yes, because I have multiple customers that
> > tell
> > me they really want to leave -ffast-math on, but they want to be
> > able to
> > disable these sub-categories.  I've been approaching this under the
> > assumption that the answer is yes for the Clang/LLVM community in
> > general.
> >
> >
> >
> > Thanks,
> >
> > -Warren
> >
> >
> >
> > *From:*mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>
> > [mailto:mehdi.amini at apple.com]
> > *Sent:* Wednesday, November 16, 2016 10:46 PM
> > *To:* Ristow, Warren <warren.ristow at sony.com
> > <mailto:warren.ristow at sony.com>>
> > *Cc:* Kaylor, Andrew <andrew.kaylor at intel.com
> > <mailto:andrew.kaylor at intel.com>>; llvm-dev at lists.llvm.org
> > <mailto:llvm-dev at lists.llvm.org>; Nicolai Hähnle
> > <nhaehnle at gmail.com
> > <mailto:nhaehnle at gmail.com>>
> > *Subject:* Re: [llvm-dev] RFC: Consider changing the semantics of
> > 'fast'
> > flag implying all fast-math-flags
> >
> >
> >
> >
> >
> >     On Nov 16, 2016, at 10:04 PM, Ristow, Warren
> >     <warren.ristow at sony.com
> >     <mailto:warren.ristow at sony.com>> wrote:
> >
> >
> >
> >     > Can you elaborate what kind of runtime failure is the
> >     > reciprocal
> >     transformation triggering?
> >
> >
> >
> >     Yes.  It was along the lines of:
> >
> >
> >
> >         {
> >
> >           float x = a / c;
> >
> >           float y = b / c;
> >
> >
> >
> >           if (y == 1.0f) {
> >
> >             // do some processing for when 'b' and 'c' are equal
> >
> >           } else {
> >
> >             // do other processing
> >
> >           }
> >
> >
> >
> >           use(x, y);
> >
> >         }
> >
> >
> >
> >     Of course they understood they could easily change this code
> >     once
> >     they understood the issue.
> >
> >
> >
> >     But the fact that it "failed" for non-edge-case values of 'c',
> >     they
> >     were worried.  As an example of the non-edge-case aspect, when
> >     'c'
> >     is 41.0f (and so of course 'b' is 41.0f), intuitively they felt
> >     that
> >     this “would work precisely”, even with fast-math.  Once they
> >     understood more, they agreed this was reasonable with
> >     fast-math, but
> >     they had the underlying concern that if they encountered one
> >     case
> >     where 'num' and 'den' were equal (and non-edge-case), yet 'num
> >     /
> >     den' wasn't precisely 1.0f, then even if they fixed this
> >     situation
> >     where they encountered it, it might be lurking elsewhere in
> >     their
> >     code, and so they wanted to disable that transformation.
> >
> >
> >
> > Thanks for elaborating.
> >
> >
> >
> > I’d be reluctant to call this situation a real use-case though.
> >
> > Is the the distinction on reciprocal really make sense here? This
> > user
> > can have the same “surprising" anywhere in their code-base with
> > reassociation as well:
> >
> >
> >
> > void foo (float a, float b) {
> >
> >   float x = a - b;
> >
> >   if (x == 0)
> >
> >      … // only if a == b
> >
> > }
> >
> >
> >
> > That would sound totally reasonable, unless foo is inlined and
> > reassociation would lead to a non-zero value for x even when a and
> > b
> > passed in to foo "if it wasn’t inlined" would be identical!
> >
> >
> >
> > (Reminds me somehow of a client that was bitten by nnan: their
> > assumption was that as long as they didn’t introduce NaN in the
> > program
> > everything was fine. However with fast-math some transformations
> > were
> > introducing NaN where there wasn’t before and propagating to other
> > computation that were transformed under the assumption that no NaN
> > would
> > show up, it also turns out that making the code safe against NaN
> > and
> > efficient at the same time is hard, especially when the code itself
> > it
> > compiled with fast-math).
> >
> >
> >
> > —
> >
> > Mehdi
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >     *From:* mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>
> >     [mailto:mehdi.amini at apple.com]
> >     *Sent:* Wednesday, November 16, 2016 7:11 PM
> >     *To:* Ristow, Warren <warren.ristow at sony.com
> >     <mailto:warren.ristow at sony.com>>
> >     *Cc:* Kaylor, Andrew <andrew.kaylor at intel.com
> >     <mailto:andrew.kaylor at intel.com>>; llvm-dev at lists.llvm.org
> >     <mailto:llvm-dev at lists.llvm.org>; Nicolai Hähnle
> >     <nhaehnle at gmail.com
> >     <mailto:nhaehnle at gmail.com>>
> >     *Subject:* Re: [llvm-dev] RFC: Consider changing the semantics
> >     of
> >     'fast' flag implying all fast-math-flags
> >
> >
> >
> >
> >
> >         On Nov 16, 2016, at 6:22 PM, Ristow, Warren
> >         <warren.ristow at sony.com <mailto:warren.ristow at sony.com>>
> >         wrote:
> >
> >
> >
> >         > ... except that Warren’s proposal that started this
> >         > discussion seems to imply that he
> >
> >         > has a use case that requires reciprocals to be turned off
> >         > separately.
> >
> >
> >
> >         Just to close this loose end, yes I have a use case.
> >
> >
> >
> >         Specifically, we have a customer that turns on
> >         '‑ffast‑math',
> >         but was getting a runtime failure due to the
> >         reciprocal-transformation being done.
> >
> >
> >
> >     Can you elaborate what kind of runtime failure is the
> >     reciprocal
> >     transformation triggering?
> >
> >
> >
> >     —
> >
> >     Mehdi
> >
> >
> >
> >
> >
> >         They don't want turn off fast‑math because they like the
> >         performance improvement, and can live with the imprecision
> >         in
> >         most cases.  So they wanted to suppress just the
> >         reciprocal-transformation.  I intended to tell them the
> >         solution
> >         was simple: use '‑ffast‑math ‑fno‑reciprocal‑math'.  But on
> >         trying it myself, I ran into the issue here.
> >
> >
> >
> >         Thanks,
> >
> >         -Warren
> >
> >
> >
> >         *From:* Kaylor, Andrew [mailto:andrew.kaylor at intel.com]
> >         *Sent:* Wednesday, November 16, 2016 4:13 PM
> >         *To:* Mehdi Amini <mehdi.amini at apple.com
> >         <mailto:mehdi.amini at apple.com>>; Ristow, Warren
> >         <warren.ristow at sony.com
> >         <mailto:warren.ristow at sony.com>>; llvm-dev at lists.llvm.org
> >         <mailto:llvm-dev at lists.llvm.org>; Nicolai Hähnle
> >         <nhaehnle at gmail.com <mailto:nhaehnle at gmail.com>>
> >         *Subject:* RE: [llvm-dev] RFC: Consider changing the
> >         semantics
> >         of 'fast' flag implying all fast-math-flags
> >
> >
> >
> >         I don’t really like the idea of updating checks of
> >         UnsafeAlgebra() to depend on all of the other flags.  It
> >         seems
> >         like it would be preferable to look at each optimization
> >         and
> >         figure out which flags it actually requires.  I suspect
> >         that in
> >         many cases the “new” flag (i.e. allowing reassociation,
> >         etc.)
> >         will be what is actually needed anyway.
> >
> >
> >
> >         I would be inclined to agree with Niolai’s suggestion of
> >         combining all the flags related to value safety, except
> >         that
> >         Warren’s proposal that started this discussion seems to
> >         imply
> >         that he has a use case that requires reciprocals to be
> >         turned
> >         off separately.
> >
> >
> >
> >         -Andy
> >
> >
> >
> >         *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org]
> >         *On
> >         Behalf Of *Mehdi Amini via llvm-dev
> >         *Sent:* Wednesday, November 16, 2016 8:55 AM
> >         *To:* Ristow, Warren <warren.ristow at sony.com
> >         <mailto:warren.ristow at sony.com>>
> >         *Cc:* llvm-dev at lists.llvm.org
> >         <mailto:llvm-dev at lists.llvm.org>
> >         *Subject:* Re: [llvm-dev] RFC: Consider changing the
> >         semantics
> >         of 'fast' flag implying all fast-math-flags
> >
> >
> >
> >
> >
> >             On Nov 15, 2016, at 11:59 PM, Ristow, Warren
> >             <warren.ristow at sony.com
> >             <mailto:warren.ristow at sony.com>> wrote:
> >
> >
> >
> >             Hi,
> >
> >
> >
> >             Thanks for the quick feedback.  I see your points, but
> >             I
> >             have a few questions/comments.  I'll start at the end
> >             of the
> >             previous post:
> >
> >
> >
> >             > ...
> >
> >             > I think these are valuable problems to solve, but you
> >             > should tackle them piece by piece:
> >
> >             >
> >
> >             > 1) the clang part of overriding the individual FMF
> >             > and emitting the right IR is the first thing to fix.
> >
> >             > 2) the backend is still using the global UnsafeFPMath
> >             > and it should be killed.
> >
> >
> >
> >             I addressed this point (2) for the reciprocal aspect in
> >             the
> >             patch, but of course that wasn't useful without doing
> >             something about (1).
> >
> >
> >
> >             Regarding (1), over
> >             at https://reviews.llvm.org/D26708#596610, David made
> >             the
> >             same point that it should be done in Clang.  I can
> >             understand that, but I wonder whether having the
> >             concept of
> >             the 'fast' flag in the IR that implies all the other
> >             FMF
> >             makes sense?  I'm not seeing a good reason for it, but
> >             since
> >             this is very new to me, I can easily imagine I'm
> >             missing the
> >             big picture.
> >
> >
> >
> >             For example, in the LLVM IR
> >             (http://llvm.org/docs/LangRef.html#fast-math-flags) the
> >             fast-math flags 'nnan', 'ninf', 'nsz', 'arcp' and
> >             'fast’ are
> >             defined.  Except for 'fast', each of these has a fairly
> >             specific definition of what they mean.  For example,
> >             for 'arcp':
> >
> >
> >
> >                 arcp => "Allow optimizations to use the reciprocal
> >                 of an
> >             argument rather
> >
> >                          than perform division."
> >
> >
> >
> >             'fast' is unusual, in that it describes a fairly
> >             generic set
> >             of aggressive floating-point optimizations:
> >
> >
> >
> >                 fast => "Allow algebraically equivalent
> >                 transformations
> >             that may dramatically
> >
> >                         change results in floating point (e.g.
> >             reassociate). This flag implies
> >
> >                         all the others."
> >
> >
> >
> >             Very loosely, 'fast' means "all the aggressive
> >             FP-transformations that are not controlled by one of
> >             the
> >             other 4, plus it implies all the other 4".  If for
> >             terminology, we call those additional aggressive
> >             optimizations 'aggr', then we have:
> >
> >
> >
> >                 'fast' == 'aggr' + 'nnan' + 'ninf' + 'nsz' + 'arcp'
> >
> >
> >
> >             So as I see it, if we want to disable only one of the
> >             other
> >             ones (like 'arcp', in my case), there isn't any way to
> >             express that with these IR flags defined this way.  In
> >             short, we cannot turn on all the flags besides 'arcp'.
> >              To
> >             do that, what we want is that somehow for the Clang
> >             switches:
> >
> >
> >
> >               '-ffast-math -fno-reciprocal-math'
> >
> >
> >
> >             to ultimately result in LLVM IR that has the following
> >             flags
> >             on in appropriate FP ops:
> >
> >
> >
> >               'aggr' + 'nnan' + 'ninf' + ‘nsz'
> >
> >
> >
> >         Make sense, I missed that we can’t *subtract* from fast at
> >         the
> >         IR level.
> >
> >
> >
> >         I wouldn’t be opposed to have something along the line of
> >         “aggr”, but there is a tradeoff: some transformation may be
> >         harder to guard with this model.
> >
> >
> >
> >         Maybe that could be a starting point: changing the
> >         “UnsafeAlgebra” bit in the FMF to be “aggr” you mention and
> >         replace all the query to FastMathFlags::UnsafeAlgebra() to
> >         return true if all the bits are set in the Flags. This
> >         alone
> >         should be nothing more than a mechanical change I believe.
> >
> >         The important part is then auditing all the users of
> >         UnsafeAlgebra() in the middle end and check if they can be
> >         “downgraded” to aggr safely: i.e. if they don’t need aggr
> >         *and*
> >         another flag.
> >
> >
> >
> >         —
> >
> >         Mehdi
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >             But I don't see a way to express 'aggr' in the IR.  We
> >             could
> >             do this, if we change the definition of the IR 'fast'
> >             flag
> >             to remove that sentence about implying all the others:
> >
> >
> >
> >                 fast => "Allow algebraically equivalent
> >                 transformations
> >             that may dramatically
> >
> >                         change results in floating point (e.g.
> >                         reassociate).
> >
> >
> >
> >             (If we do something like that, we may want to change
> >             the
> >             name from 'fast' to something else (like 'aggr'), to
> >             avoid
> >             tying it too closely to the concept of the
> >             '-ffast-math'
> >             switch.)
> >
> >
> >
> >             As an aside, I don't know if the "reassociate" example
> >             is
> >             the only other transformation that's allowed by 'fast'
> >             (I
> >             presume it isn't), but I think reassociation would be
> >             better
> >             expressed by a separate flag, which could then be
> >             controlled
> >             independently via '-f[no]-associative-math' switch.
> >              Not
> >             having that flag exist separately in the FMF is the
> >             origin
> >             of PR27372.  But creating that flag and using it in the
> >             appropriate places would still run into these problems
> >             of
> >             'fast' implying all the others, which would make it
> >             impossible to disable reassociation while leaving all
> >             the
> >             other FMF transformations enabled.
> >
> >
> >
> >             To ask a concrete question using the current definition
> >             of
> >             'fast' (which includes enabling reassociation, as the
> >             LLVM
> >             IR documentation of FMF says), how can we express in
> >             the IR
> >             that reciprocal-transformations are not allowed, but
> >             reassociation is allowed?
> >
> >
> >
> >             So the bottom line is that I do see there are issues in
> >             Clang that are relevant.  But as long as 'fast' means
> >             "'aggr' plus all the other FMF transformations", I
> >             don't see
> >             how we can effectively disable a subset of those other
> >             FMF
> >             transformations (while leaving 'aggr' transformations,
> >             such
> >             as reassociation, enabled).  With that in mind, my
> >             patch
> >             took one step in having 'fast' no longer imply all the
> >             others.
> >
> >
> >
> >             Thanks,
> >
> >             -Warren
> >
> >
> >
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory