[llvm-dev] RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags

Hal Finkel via llvm-dev llvm-dev at lists.llvm.org
Fri Nov 18 14:34:57 PST 2016


----- Original Message -----

> From: "David L via llvm-dev Kreitzer" <llvm-dev at lists.llvm.org>
> To: "Warren Ristow" <warren.ristow at sony.com>, "mehdi amini"
> <mehdi.amini at apple.com>
> Cc: llvm-dev at lists.llvm.org
> Sent: Friday, November 18, 2016 3:39:33 PM
> Subject: Re: [llvm-dev] RFC: Consider changing the semantics of
> 'fast' flag implying all fast-math-flags

> I just read through this thread, and I did not see a good definition
> of what
> exactly "fast + no-arcp" would mean. Clearly (1) would be disallowed,
> but what
> about the others?

> (1) X / Y --> X * (1.0 / Y)
> (2) (X * Y) / Z --> (X / Z) * Y
> (3) (X / Z) * Y --> X * (Y / Z)
> (4) (X / Y) / Z --> X / (Y * Z)
> etc.

> It is easy to write a unit test for each of (1)-(4) showing that
> gcc6.2 will
> apply the transform under "-ffast-math" but not under
> "-ffast-math -fno-reciprocal-math". (It is also easy to write unit
> tests where
> gcc6.2 will perform these transforms in spite of
> -fno-reciprocal-math, but I
> assume those would be considered bugs.)
Interesting question. I think this is right. (1) must be disallowed (that *is* the exact transformation after which the flag is named). Disallowing (2-4) makes sense to me also in that configuration. The basic idea being that, with -fno-reciprocal-math, we never make an algebraic change to the numerator or denominator of a division operation. 

> I trust the intent is to update the language reference such that it
> is easy to
> reason about the correctness of these and other division-related
> transforms?
Yes, we should definitely make sure that the language reference is clear. 

-Hal 

> Thanks,
> - Dave

> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of
> Ristow, Warren via llvm-dev
> Sent: Thursday, November 17, 2016 5:24 PM
> To: mehdi.amini at apple.com
> Cc: llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] RFC: Consider changing the semantics of
> 'fast' flag implying all fast-math-flags

> Thanks for all that. I think we’re more in agreement here than it may
> have appeared initially.

> > So let’s just fix it!

> Sounds good!

> I have some other things on my plate at the moment, so I doubt I’ll
> get to working on this until after Thanksgiving (I don’t won’t my
> lack of activity to be interpreted as a loss of interest on my part
> to get this done).

> Before work can be done to fix it, the details of precisely what
> changes we want to make in the fast-math-flags IR needs to be
> decided. There has been some discussion in this thread on that point
> (‘aggr’, ‘reassoc’ + ‘libm’, something else?), but no clear spec.
> I’d be happy to propose something concrete, and I’d fully expect
> that it would evolve a bit after feedback. I’m also happy for others
> to propose specifics. In any case, I won’t work on taking this
> further until sometime after Thanksgiving.

> -Warren

> From: mehdi.amini at apple.com [ mailto:mehdi.amini at apple.com ]
> Sent: Thursday, November 17, 2016 2:03 PM
> To: Ristow, Warren < warren.ristow at sony.com >
> Cc: llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] RFC: Consider changing the semantics of
> 'fast' flag implying all fast-math-flags

> > On Nov 17, 2016, at 1:44 PM, Mehdi Amini via llvm-dev <
> > llvm-dev at lists.llvm.org > wrote:
> 

> > > On Nov 17, 2016, at 1:24 PM, Ristow, Warren <
> > > warren.ristow at sony.com
> > > > wrote:
> > 
> 

> > > On the plus side, I'm glad to see the conclusions of the last
> > > couple
> > > of posts.
> > 
> 

> > > From Mehdi:
> > 
> 

> > > > Hope this clarify where I see the direction going, and even if
> > > > you
> > > > don’t agree with my
> > > 
> > 
> 
> > > > reasoning, the conclusion should be satisfactory on your side
> > > > :)
> > > 
> > 
> 
> > > I'd say that summarizes my thoughts on this well.
> > 
> 

> > > And from Nicolai:
> > 
> 

> > > > Right. I'm not fundamentally opposed to having these flags, ...
> > > 
> > 
> 
> > > I do agree with much of what you both say, but definitely not all
> > > of
> > > it. The philosophy of not providing what a customer requests and
> > > instead guiding them to a better alternative is something I agree
> > > with -- we don't just give them a pony. And I agree *strongly*
> > > that
> > > just because a program gets the answer a user wants with GCC
> > > (using
> > > fast-math) and we get an answer they view as "wrong", doesn't
> > > mean
> > > it's a bug of ours and that we need to change to get the same
> > > answer
> > > as GCC. That's not what our goal of GCC compatibility means to
> > > me.
> > 
> 

> > > But we do have a switch '-fno-reciprocal-math' that we accept,
> > > and
> > > even process/implement to some extent. But that implementation
> > > has
> > > a
> > > bug. Fixing that bug so that when a user says '-ffast-math
> > > -fno-reciprocal-math', we enable the fast-math transformations
> > > but
> > > explicitly disable the reciprocal transformations is, in my view,
> > > the right thing to do. Simply, that is a bug that we ought to fix
> > > --
> > > unless we agree to abandon support of '-fno-reciprocal-math',
> > > which
> > > I think isn't under consideration at this stage. And FTR, I'd
> > > oppose
> > > that, not surprisingly. :)
> > 
> 
> > > I'm not at all trying to justify the "pony" use-case from this
> > > customer, but if we provide '-fno-reciprocal-math', I think we
> > > ought
> > > to fix bugs we find in our implementation of it. Fixing that bug
> > > doesn't guarantee we'll then get the same answers as GCC does on
> > > every program when compiled with '-ffast-math
> > > -fno-reciprocal-math',
> > > but IMO that isn't required for us to describe our behavior as
> > > "GCC
> > > compatibility" in this respect.
> > 
> 

> > > Fast-math is "unusual", in that the user is explicitly opening
> > > the
> > > door to allowing us to do non-compliant transformations. As
> > > compared
> > > with GCC, our implementation can have a subset or a superset of
> > > these non-compliant transformations, and we can still call that
> > > "GCC
> > > compatibility". As an analogous "not unusual" feature, both we
> > > and
> > > GCC do type based alias analysis. It's a perfectly
> > > standard-compliant thing to do optimizations based on conclusions
> > > from the tbaa. We both support the switch
> > > '-f[no-]strict-aliasing'
> > > to control this (and we both enable it by default). Referring to
> > > this as "GCC compatibility" is perfectly legitimate, in my view.
> > > But
> > > if a user program has an aliasing bug in it, and our tbaa directs
> > > us
> > > to aggressively optimize it, whereas GCC's doesn't (and so the
> > > user
> > > gets the answer they wanted with GCC, but not with us), this does
> > > not mean we have a bug, or that saying we're GCC compatible in
> > > terms
> > > of '-f[no-]strict-aliasing' is a "lie". We can do a superset or
> > > subset of the optimizations that GCC does in terms of alias
> > > analysis, and we can quite reasonably describe us a GCC
> > > compatible
> > > in terms of us providing this capability. A user insisting we
> > > have
> > > a
> > > bug in this tbaa situation is analogous to your "pony" request
> > > about
> > > "float test_div(float a, float b) { return a/b; }". And
> > > (unrelated
> > > to Clang/LLVM) I've had this sort of objection from users in tbaa
> > > situations in the past, where I've had to defend my point that
> > > just
> > > because GCC didn't optimize it as aggressively as the compiler I
> > > was
> > > providing, it wasn't a bug in our compiler. So I'm all for not
> > > giving everyone a pony.
> > 
> 

> > > But irrespective of how silly a test-case it may be to do:
> > 
> 

> > > {
> > 
> 
> > > float x = a / c;
> > 
> 
> > > float y = b / c;
> > 
> 

> > > if (y == 1.0f) {
> > 
> 
> > > // do some processing for when 'b' and 'c' are equal
> > 
> 
> > > } else {
> > 
> 
> > > // do other processing
> > 
> 
> > > }
> > 
> 

> > > use(x, y);
> > 
> 
> > > }
> > 
> 

> > > I cannot in good conscience tell the customer that it's OK for us
> > > to
> > > do:
> > 
> 

> > > float tmp = 1.0f / c;
> > 
> 
> > > float x = a * tmp;
> > 
> 
> > > float y = b * tmp;
> > 
> 

> > > when they specified '-ffast-math -fno-reciprocal-math'. They can
> > > rightfully come back and say "what do you mean by
> > > '-fno-reciprocal-math'?" I have to call that a compiler-bug.
> > 
> 
> > I agree with all you wrote above :)
> 
> > But I’d add that a legitimate fix could be for the clang driver to
> > issue an error (or a warning) saying “-fno-reciprocal-math” isn’t
> > compatible with -ffast-math, disabling -fxxxxxx” (with xxxxx being
> > one or the other ;))
> 
> I don’t want to add confusion, I feel I’m doing a bad job here
> somehow: I’m not saying we *should* do this (rejecting in the
> driver). So let’s just fix it!

>
> Mehdi

> > > -----Original Message-----
> > 
> 
> > > From: Nicolai Hähnle [ mailto:nhaehnle at gmail.com ]
> > 
> 
> > > Sent: Thursday, November 17, 2016 12:36 PM
> > 
> 
> > > To: Kaylor, Andrew < andrew.kaylor at intel.com >; Ristow, Warren <
> > > warren.ristow at sony.com >; mehdi.amini at apple.com
> > 
> 
> > > Cc: llvm-dev at lists.llvm.org
> > 
> 
> > > Subject: Re: [llvm-dev] RFC: Consider changing the semantics of
> > > 'fast' flag implying all fast-math-flags
> > 
> 

> > > On 17.11.2016 19:54, Kaylor, Andrew wrote:
> > 
> 
> > > > > All that said, I think we (the company I work for, Sony) will
> > > > > have
> > > > > to
> > > > 
> > > 
> > 
> 
> > > > > implement support for these switches. It comes down to GCC
> > > > > has
> > > > > these
> > > > 
> > > 
> > 
> 
> > > > > switches (e.g., -fno-reciprocal-math and
> > > > > -fno-associative-math),
> > > > > and
> > > > > they do suppress the transformations for our customers.
> > > > 
> > > 
> > 
> 
> > > > > They switch to Clang/LLVM, they use the same switches, and it
> > > > > doesn't
> > > > 
> > > 
> > 
> 
> > > > > "work". So as a practical matter, I think we will support
> > > > > them.
> > > > 
> > > 
> > 
> 
> > > > > Whether the LLVM community in general feels that that's
> > > > > required,
> > > > > is
> > > > 
> > > 
> > 
> 
> > > > > another question. Until for your recent comments here, and
> > > > > Nicolai's
> > > > 
> > > 
> > 
> 
> > > > > comments above, I would have thought the answer was clearly
> > > > > yes.
> > > > > But
> > > > > maybe that's not the case.
> > > > 
> > > 
> > 
> 
> > > > I think this is a very good point. You (Sony) are not the only
> > > > ones
> > > 
> > 
> 
> > > > who are concerned with GCC-command line compatibility. It
> > > > definitely
> > > 
> > 
> 
> > > > should hold some weight. Given that this is something we could
> > > > do
> > > 
> > 
> 
> > > > with just a little more effort, I’m not sure mere simplicity is
> > > > enough
> > > 
> > 
> 
> > > > reason not to do it.
> > > 
> > 
> 
> > > Right. I'm not fundamentally opposed to having these flags, as
> > > long
> > > as we can agree that the *only* reason for having them is
> > > slightly
> > > better GCC compatibility. The "slightly better" is important,
> > > too,
> > > because promising real compatibility with any kind of fast
> > > math-type
> > > setting would simply be a lie.
> > 
> 

> > > So (to answer Mehdi's question in a different part of the
> > > thread),
> > > I'd consider keeping arcp around a wart, but an acceptable one.
> > > I'm
> > > fine
> > 
> 
> > > with: IR 'fast' becomes IR 'reassociation' (or similar;
> > > algebraically
> > > correct transforms that may change rounding), and reciprocal math
> > > becomes "this thing that should logically be enabled by
> > > 'reassociation', but instead requires 'arcp' for
> > > GCC-'compatibility'
> > > reasons".
> > 
> 

> > > And to be clear, 'reassociation' should _not_ by itself allow
> > > transforms like X * (Y + 1) --> X * Y + X which can change the
> > > NaN-ness of the result when Infs are among the arguments. That's
> > > what 'reassociation' + 'ninf' is for.
> > 
> 

> > > > Also, on a slight tangent...
> > > 
> > 
> 

> > > > > > I'd be really curious to know if there is anybody who
> > > > > > really
> > > > > > needs
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > arcp without fp-contract=fast or vice versa, or who needs
> > > > > > both
> > > > > > of
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > these but not the X*log2(0.5*Y) transform you mentioned,
> > > > > > and
> > > > > > so
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > on.[1]
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > I just wanted to mention that fp-contract relates to things
> > > > like
> > > > FMA
> > > 
> > 
> 
> > > > and shouldn’t be confused with fast-math.
> > > 
> > 
> 
> > > It's conceptually the same type of thing though, isn't it? At
> > > least
> > > fp-contract=fast, which means "use FMA even when it changes
> > > floating
> > > point results (due to different rounding)". This is kind of like
> > > the
> > > 'fast' flag, which means "do all sorts of transformations even
> > > when
> > > they change floating point results (due to different rounding)".
> > > I
> > > don't know whether clang -ffast-math enables fp-contract=fast,
> > > but
> > > I'd say that in a clean, from-scratch design, fp-contract=fast
> > > shouldn't be a separate flag.
> > 
> 

> > > Cheers,
> > 
> 
> > > Nicolai
> > 
> 

> > > > -Andy
> > > 
> > 
> 

> > > > *From:*Ristow, Warren [ mailto:warren.ristow at sony.com ]
> > > 
> > 
> 
> > > > *Sent:* Thursday, November 17, 2016 12:51 AM
> > > 
> > 
> 
> > > > *To:* mehdi.amini at apple.com
> > > 
> > 
> 
> > > > *Cc:* Kaylor, Andrew < andrew.kaylor at intel.com >;
> > > 
> > 
> 
> > > > llvm-dev at lists.llvm.org ; Nicolai Hähnle < nhaehnle at gmail.com >
> > > 
> > 
> 
> > > > *Subject:* RE: [llvm-dev] RFC: Consider changing the semantics
> > > > of
> > > > 'fast'
> > > 
> > 
> 
> > > > flag implying all fast-math-flags
> > > 
> > 
> 

> > > > Those are all good points. Your reassociation point in the
> > > > context
> > > > of
> > > 
> > 
> 
> > > > inlining is particularly interesting.
> > > 
> > 
> 

> > > > FWIW, we also have a case where a customer wants
> > > > '-fno-associative-math'
> > > 
> > 
> 
> > > > to suppress reassociation under '-ffastmath'. It would take me
> > > > a
> > > 
> > 
> 
> > > > while to find the specifics of the issue, but it was (if my
> > > > memory
> > > > is
> > > 
> > 
> 
> > > > right) more of a real use-case. (That is to say, the code that
> > > > was
> > > > "failing"
> > > 
> > 
> 
> > > > due to reassociation didn't have an obvious fix like the
> > > > reciprocal
> > > 
> > 
> 
> > > > situation, here, other than to turn off fast-math.) In fact,
> > > > the
> > > 
> > 
> 
> > > > request to suppress reassociation was the motivation for
> > > > creating
> > > 
> > 
> 
> > > > PR27372 in the first place (which eventually fed into this
> > > > thread).
> > > > I
> > > 
> > 
> 
> > > > have to say that on the reassociation point, my concern is that
> > > > to
> > > 
> > 
> 
> > > > really suppress that, we will have to suppress so much, that
> > > > there
> > > 
> > 
> 
> > > > will hardly be any point in using -ffast-math.
> > > 
> > 
> 

> > > > I'd say your comments here are very similar to what Nicolai
> > > > said
> > > > in
> > > 
> > 
> 
> > > > another subthread of this discussion:
> > > 
> > 
> 

> > > > > > I'd be really curious to know if there is anybody who
> > > > > > really
> > > > > > needs
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > arcp
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > without fp-contract=fast or vice versa, or who needs both
> > > > > > of
> > > > > > these
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > but
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > not the X*log2(0.5*Y) transform you mentioned, and so
> > > > > > on.[1]
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > ...
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > [1] One case I _can_ think of (and which may have been a
> > > > > > reason
> > > > > > for
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > the
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > proliferation of flags in the first place) is somebody who
> > > > > > enables
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > fast
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > math, but then doesn't want their results to change when
> > > > > > they
> > > > > > update
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > the
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > compiler and get a new set of optimizations. But IMO that's
> > > > > > a
> > > > > > use
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > case
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > that should be explicitly rejected.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > I think those are all really good points, and an argument can
> > > > be
> > > > made
> > > 
> > 
> 
> > > > that when -ffast-math gives you results you don't want, then
> > > > you
> > > > just
> > > 
> > 
> 
> > > > have to turn it off. Essentially, the user can't "have his cake
> > > > and
> > > 
> > 
> 
> > > > eat it too".
> > > 
> > 
> 

> > > > All that said, I think we (the company I work for, Sony) will
> > > > have
> > > > to
> > > 
> > 
> 
> > > > implement support for these switches. It comes down to GCC has
> > > > these
> > > 
> > 
> 
> > > > switches (e.g., -fno-reciprocal-math and
> > > > -fno-associative-math),
> > > > and
> > > 
> > 
> 
> > > > they do suppress the transformations for our customers. They
> > > > switch
> > > 
> > 
> 
> > > > to Clang/LLVM, they use the same switches, and it doesn't
> > > > "work".
> > > > So
> > > 
> > 
> 
> > > > as a practical matter, I think we will support them. Whether
> > > > the
> > > > LLVM
> > > 
> > 
> 
> > > > community in general feels that that's required, is another
> > > > question.
> > > 
> > 
> 
> > > > Until for your recent comments here, and Nicolai's comments
> > > > above,
> > > > I
> > > 
> > 
> 
> > > > would have thought the answer was clearly yes. But maybe that's
> > > > not
> > > 
> > 
> 
> > > > the case.
> > > 
> > 
> 

> > > > In summary, irrespective of any (subjective?) assessment of how
> > > 
> > 
> 
> > > > legitimate a particular use-case is, do we want switches like:
> > > 
> > 
> 

> > > > -ffast-math -fno-reciprocal-math
> > > 
> > 
> 

> > > > -ffast-math -fno-associative-math
> > > 
> > 
> 

> > > > to work?
> > > 
> > 
> 

> > > > For me, the answer is yes, because I have multiple customers
> > > > that
> > > > tell
> > > 
> > 
> 
> > > > me they really want to leave -ffast-math on, but they want to
> > > > be
> > > > able
> > > 
> > 
> 
> > > > to disable these sub-categories. I've been approaching this
> > > > under
> > > > the
> > > 
> > 
> 
> > > > assumption that the answer is yes for the Clang/LLVM community
> > > > in
> > > > general.
> > > 
> > 
> 

> > > > Thanks,
> > > 
> > 
> 

> > > > -Warren
> > > 
> > 
> 

> > > > *From:* mehdi.amini at apple.com < mailto:mehdi.amini at apple.com >
> > > 
> > 
> 
> > > > [ mailto:mehdi.amini at apple.com ]
> > > 
> > 
> 
> > > > *Sent:* Wednesday, November 16, 2016 10:46 PM
> > > 
> > 
> 
> > > > *To:* Ristow, Warren < warren.ristow at sony.com
> > > 
> > 
> 
> > > > < mailto:warren.ristow at sony.com >>
> > > 
> > 
> 
> > > > *Cc:* Kaylor, Andrew < andrew.kaylor at intel.com
> > > 
> > 
> 
> > > > < mailto:andrew.kaylor at intel.com >>; llvm-dev at lists.llvm.org
> > > 
> > 
> 
> > > > < mailto:llvm-dev at lists.llvm.org >; Nicolai Hähnle <
> > > > nhaehnle at gmail.com
> > > 
> > 
> 
> > > > < mailto:nhaehnle at gmail.com >>
> > > 
> > 
> 
> > > > *Subject:* Re: [llvm-dev] RFC: Consider changing the semantics
> > > > of
> > > > 'fast'
> > > 
> > 
> 
> > > > flag implying all fast-math-flags
> > > 
> > 
> 

> > > > On Nov 16, 2016, at 10:04 PM, Ristow, Warren <
> > > > warren.ristow at sony.com
> > > 
> > 
> 
> > > > < mailto:warren.ristow at sony.com >> wrote:
> > > 
> > 
> 

> > > > > Can you elaborate what kind of runtime failure is the
> > > > > reciprocal
> > > > 
> > > 
> > 
> 
> > > > transformation triggering?
> > > 
> > 
> 

> > > > Yes. It was along the lines of:
> > > 
> > 
> 

> > > > {
> > > 
> > 
> 

> > > > float x = a / c;
> > > 
> > 
> 

> > > > float y = b / c;
> > > 
> > 
> 

> > > > if (y == 1.0f) {
> > > 
> > 
> 

> > > > // do some processing for when 'b' and 'c' are equal
> > > 
> > 
> 

> > > > } else {
> > > 
> > 
> 

> > > > // do other processing
> > > 
> > 
> 

> > > > }
> > > 
> > 
> 

> > > > use(x, y);
> > > 
> > 
> 

> > > > }
> > > 
> > 
> 

> > > > Of course they understood they could easily change this code
> > > > once
> > > 
> > 
> 
> > > > they understood the issue.
> > > 
> > 
> 

> > > > But the fact that it "failed" for non-edge-case values of 'c',
> > > > they
> > > 
> > 
> 
> > > > were worried. As an example of the non-edge-case aspect, when
> > > > 'c'
> > > 
> > 
> 
> > > > is 41.0f (and so of course 'b' is 41.0f), intuitively they felt
> > > > that
> > > 
> > 
> 
> > > > this “would work precisely”, even with fast-math. Once they
> > > 
> > 
> 
> > > > understood more, they agreed this was reasonable with
> > > > fast-math,
> > > > but
> > > 
> > 
> 
> > > > they had the underlying concern that if they encountered one
> > > > case
> > > 
> > 
> 
> > > > where 'num' and 'den' were equal (and non-edge-case), yet 'num
> > > > /
> > > 
> > 
> 
> > > > den' wasn't precisely 1.0f, then even if they fixed this
> > > > situation
> > > 
> > 
> 
> > > > where they encountered it, it might be lurking elsewhere in
> > > > their
> > > 
> > 
> 
> > > > code, and so they wanted to disable that transformation.
> > > 
> > 
> 

> > > > Thanks for elaborating.
> > > 
> > 
> 

> > > > I’d be reluctant to call this situation a real use-case though.
> > > 
> > 
> 

> > > > Is the the distinction on reciprocal really make sense here?
> > > > This
> > > > user
> > > 
> > 
> 
> > > > can have the same “surprising" anywhere in their code-base with
> > > 
> > 
> 
> > > > reassociation as well:
> > > 
> > 
> 

> > > > void foo (float a, float b) {
> > > 
> > 
> 

> > > > float x = a - b;
> > > 
> > 
> 

> > > > if (x == 0)
> > > 
> > 
> 

> > > > … // only if a == b
> > > 
> > 
> 

> > > > }
> > > 
> > 
> 

> > > > That would sound totally reasonable, unless foo is inlined and
> > > 
> > 
> 
> > > > reassociation would lead to a non-zero value for x even when a
> > > > and
> > > > b
> > > 
> > 
> 
> > > > passed in to foo "if it wasn’t inlined" would be identical!
> > > 
> > 
> 

> > > > (Reminds me somehow of a client that was bitten by nnan: their
> > > 
> > 
> 
> > > > assumption was that as long as they didn’t introduce NaN in the
> > > 
> > 
> 
> > > > program everything was fine. However with fast-math some
> > > 
> > 
> 
> > > > transformations were introducing NaN where there wasn’t before
> > > > and
> > > 
> > 
> 
> > > > propagating to other computation that were transformed under
> > > > the
> > > 
> > 
> 
> > > > assumption that no NaN would show up, it also turns out that
> > > > making
> > > 
> > 
> 
> > > > the code safe against NaN and efficient at the same time is
> > > > hard,
> > > 
> > 
> 
> > > > especially when the code itself it compiled with fast-math).
> > > 
> > 
> 

> > > > —
> > > 
> > 
> 

> > > > Mehdi
> > > 
> > 
> 

> > > > *From:* mehdi.amini at apple.com < mailto:mehdi.amini at apple.com >
> > > 
> > 
> 
> > > > [ mailto:mehdi.amini at apple.com ]
> > > 
> > 
> 
> > > > *Sent:* Wednesday, November 16, 2016 7:11 PM
> > > 
> > 
> 
> > > > *To:* Ristow, Warren < warren.ristow at sony.com
> > > 
> > 
> 
> > > > < mailto:warren.ristow at sony.com >>
> > > 
> > 
> 
> > > > *Cc:* Kaylor, Andrew < andrew.kaylor at intel.com
> > > 
> > 
> 
> > > > < mailto:andrew.kaylor at intel.com >>; llvm-dev at lists.llvm.org
> > > 
> > 
> 
> > > > < mailto:llvm-dev at lists.llvm.org >; Nicolai Hähnle <
> > > > nhaehnle at gmail.com
> > > 
> > 
> 
> > > > < mailto:nhaehnle at gmail.com >>
> > > 
> > 
> 
> > > > *Subject:* Re: [llvm-dev] RFC: Consider changing the semantics
> > > > of
> > > 
> > 
> 
> > > > 'fast' flag implying all fast-math-flags
> > > 
> > 
> 

> > > > On Nov 16, 2016, at 6:22 PM, Ristow, Warren
> > > 
> > 
> 
> > > > < warren.ristow at sony.com < mailto:warren.ristow at sony.com >>
> > > > wrote:
> > > 
> > 
> 

> > > > > ... except that Warren’s proposal that started this
> > > > 
> > > 
> > 
> 
> > > > discussion seems to imply that he
> > > 
> > 
> 

> > > > > has a use case that requires reciprocals to be turned off
> > > > > separately.
> > > > 
> > > 
> > 
> 
> > > > Just to close this loose end, yes I have a use case.
> > > 
> > 
> 

> > > > Specifically, we have a customer that turns on '‑ffast‑math',
> > > 
> > 
> 
> > > > but was getting a runtime failure due to the
> > > 
> > 
> 
> > > > reciprocal-transformation being done.
> > > 
> > 
> 

> > > > Can you elaborate what kind of runtime failure is the
> > > > reciprocal
> > > 
> > 
> 
> > > > transformation triggering?
> > > 
> > 
> 

> > > > —
> > > 
> > 
> 

> > > > Mehdi
> > > 
> > 
> 

> > > > They don't want turn off fast‑math because they like the
> > > 
> > 
> 
> > > > performance improvement, and can live with the imprecision in
> > > 
> > 
> 
> > > > most cases. So they wanted to suppress just the
> > > 
> > 
> 
> > > > reciprocal-transformation. I intended to tell them the solution
> > > 
> > 
> 
> > > > was simple: use '‑ffast‑math ‑fno‑reciprocal‑math'. But on
> > > 
> > 
> 
> > > > trying it myself, I ran into the issue here.
> > > 
> > 
> 

> > > > Thanks,
> > > 
> > 
> 

> > > > -Warren
> > > 
> > 
> 

> > > > *From:* Kaylor, Andrew [ mailto:andrew.kaylor at intel.com ]
> > > 
> > 
> 
> > > > *Sent:* Wednesday, November 16, 2016 4:13 PM
> > > 
> > 
> 
> > > > *To:* Mehdi Amini < mehdi.amini at apple.com
> > > 
> > 
> 
> > > > < mailto:mehdi.amini at apple.com >>; Ristow, Warren
> > > 
> > 
> 
> > > > < warren.ristow at sony.com
> > > 
> > 
> 
> > > > < mailto:warren.ristow at sony.com >>; llvm-dev at lists.llvm.org
> > > 
> > 
> 
> > > > < mailto:llvm-dev at lists.llvm.org >; Nicolai Hähnle
> > > 
> > 
> 
> > > > < nhaehnle at gmail.com < mailto:nhaehnle at gmail.com >>
> > > 
> > 
> 
> > > > *Subject:* RE: [llvm-dev] RFC: Consider changing the semantics
> > > 
> > 
> 
> > > > of 'fast' flag implying all fast-math-flags
> > > 
> > 
> 

> > > > I don’t really like the idea of updating checks of
> > > 
> > 
> 
> > > > UnsafeAlgebra() to depend on all of the other flags. It seems
> > > 
> > 
> 
> > > > like it would be preferable to look at each optimization and
> > > 
> > 
> 
> > > > figure out which flags it actually requires. I suspect that in
> > > 
> > 
> 
> > > > many cases the “new” flag (i.e. allowing reassociation, etc.)
> > > 
> > 
> 
> > > > will be what is actually needed anyway.
> > > 
> > 
> 

> > > > I would be inclined to agree with Niolai’s suggestion of
> > > 
> > 
> 
> > > > combining all the flags related to value safety, except that
> > > 
> > 
> 
> > > > Warren’s proposal that started this discussion seems to imply
> > > 
> > 
> 
> > > > that he has a use case that requires reciprocals to be turned
> > > 
> > 
> 
> > > > off separately.
> > > 
> > 
> 

> > > > -Andy
> > > 
> > 
> 

> > > > *From:* llvm-dev [ mailto:llvm-dev-bounces at lists.llvm.org ] *On
> > > 
> > 
> 
> > > > Behalf Of *Mehdi Amini via llvm-dev
> > > 
> > 
> 
> > > > *Sent:* Wednesday, November 16, 2016 8:55 AM
> > > 
> > 
> 
> > > > *To:* Ristow, Warren < warren.ristow at sony.com
> > > 
> > 
> 
> > > > < mailto:warren.ristow at sony.com >>
> > > 
> > 
> 
> > > > *Cc:* llvm-dev at lists.llvm.org < mailto:llvm-dev at lists.llvm.org
> > > > >
> > > 
> > 
> 
> > > > *Subject:* Re: [llvm-dev] RFC: Consider changing the semantics
> > > 
> > 
> 
> > > > of 'fast' flag implying all fast-math-flags
> > > 
> > 
> 

> > > > On Nov 15, 2016, at 11:59 PM, Ristow, Warren
> > > 
> > 
> 
> > > > < warren.ristow at sony.com < mailto:warren.ristow at sony.com >>
> > > > wrote:
> > > 
> > 
> 

> > > > Hi,
> > > 
> > 
> 

> > > > Thanks for the quick feedback. I see your points, but I
> > > 
> > 
> 
> > > > have a few questions/comments. I'll start at the end of the
> > > 
> > 
> 
> > > > previous post:
> > > 
> > 
> 

> > > > > ...
> > > > 
> > > 
> > 
> 
> > > > > I think these are valuable problems to solve, but you should
> > > > > tackle
> > > > > them piece by piece:
> > > > 
> > > 
> > 
> 
> > > > > 1) the clang part of overriding the individual FMF and
> > > > > emitting
> > > > > the
> > > > > right IR is the first thing to fix.
> > > > 
> > > 
> > 
> 
> > > > > 2) the backend is still using the global UnsafeFPMath and it
> > > > > should
> > > > > be killed.
> > > > 
> > > 
> > 
> 
> > > > I addressed this point (2) for the reciprocal aspect in the
> > > 
> > 
> 
> > > > patch, but of course that wasn't useful without doing
> > > 
> > 
> 
> > > > something about (1).
> > > 
> > 
> 

> > > > Regarding (1), over
> > > 
> > 
> 
> > > > at https://reviews.llvm.org/D26708#596610 , David made the
> > > 
> > 
> 
> > > > same point that it should be done in Clang. I can
> > > 
> > 
> 
> > > > understand that, but I wonder whether having the concept of
> > > 
> > 
> 
> > > > the 'fast' flag in the IR that implies all the other FMF
> > > 
> > 
> 
> > > > makes sense? I'm not seeing a good reason for it, but since
> > > 
> > 
> 
> > > > this is very new to me, I can easily imagine I'm missing the
> > > 
> > 
> 
> > > > big picture.
> > > 
> > 
> 

> > > > For example, in the LLVM IR
> > > 
> > 
> 
> > > > ( http://llvm.org/docs/LangRef.html#fast-math-flags ) the
> > > 
> > 
> 
> > > > fast-math flags 'nnan', 'ninf', 'nsz', 'arcp' and 'fast’ are
> > > 
> > 
> 
> > > > defined. Except for 'fast', each of these has a fairly
> > > 
> > 
> 
> > > > specific definition of what they mean. For example, for 'arcp':
> > > 
> > 
> 

> > > > arcp => "Allow optimizations to use the reciprocal of an
> > > 
> > 
> 
> > > > argument rather
> > > 
> > 
> 

> > > > than perform division."
> > > 
> > 
> 

> > > > 'fast' is unusual, in that it describes a fairly generic set
> > > 
> > 
> 
> > > > of aggressive floating-point optimizations:
> > > 
> > 
> 

> > > > fast => "Allow algebraically equivalent transformations
> > > 
> > 
> 
> > > > that may dramatically
> > > 
> > 
> 

> > > > change results in floating point (e.g.
> > > 
> > 
> 
> > > > reassociate). This flag implies
> > > 
> > 
> 

> > > > all the others."
> > > 
> > 
> 

> > > > Very loosely, 'fast' means "all the aggressive
> > > 
> > 
> 
> > > > FP-transformations that are not controlled by one of the
> > > 
> > 
> 
> > > > other 4, plus it implies all the other 4". If for
> > > 
> > 
> 
> > > > terminology, we call those additional aggressive
> > > 
> > 
> 
> > > > optimizations 'aggr', then we have:
> > > 
> > 
> 

> > > > 'fast' == 'aggr' + 'nnan' + 'ninf' + 'nsz' + 'arcp'
> > > 
> > 
> 

> > > > So as I see it, if we want to disable only one of the other
> > > 
> > 
> 
> > > > ones (like 'arcp', in my case), there isn't any way to
> > > 
> > 
> 
> > > > express that with these IR flags defined this way. In
> > > 
> > 
> 
> > > > short, we cannot turn on all the flags besides 'arcp'. To
> > > 
> > 
> 
> > > > do that, what we want is that somehow for the Clang switches:
> > > 
> > 
> 

> > > > '-ffast-math -fno-reciprocal-math'
> > > 
> > 
> 

> > > > to ultimately result in LLVM IR that has the following flags
> > > 
> > 
> 
> > > > on in appropriate FP ops:
> > > 
> > 
> 

> > > > 'aggr' + 'nnan' + 'ninf' + ‘nsz'
> > > 
> > 
> 

> > > > Make sense, I missed that we can’t *subtract* from fast at the
> > > 
> > 
> 
> > > > IR level.
> > > 
> > 
> 

> > > > I wouldn’t be opposed to have something along the line of
> > > 
> > 
> 
> > > > “aggr”, but there is a tradeoff: some transformation may be
> > > 
> > 
> 
> > > > harder to guard with this model.
> > > 
> > 
> 

> > > > Maybe that could be a starting point: changing the
> > > 
> > 
> 
> > > > “UnsafeAlgebra” bit in the FMF to be “aggr” you mention and
> > > 
> > 
> 
> > > > replace all the query to FastMathFlags::UnsafeAlgebra() to
> > > 
> > 
> 
> > > > return true if all the bits are set in the Flags. This alone
> > > 
> > 
> 
> > > > should be nothing more than a mechanical change I believe.
> > > 
> > 
> 

> > > > The important part is then auditing all the users of
> > > 
> > 
> 
> > > > UnsafeAlgebra() in the middle end and check if they can be
> > > 
> > 
> 
> > > > “downgraded” to aggr safely: i.e. if they don’t need aggr *and*
> > > 
> > 
> 
> > > > another flag.
> > > 
> > 
> 

> > > > —
> > > 
> > 
> 

> > > > Mehdi
> > > 
> > 
> 

> > > > But I don't see a way to express 'aggr' in the IR. We could
> > > 
> > 
> 
> > > > do this, if we change the definition of the IR 'fast' flag
> > > 
> > 
> 
> > > > to remove that sentence about implying all the others:
> > > 
> > 
> 

> > > > fast => "Allow algebraically equivalent transformations
> > > 
> > 
> 
> > > > that may dramatically
> > > 
> > 
> 

> > > > change results in floating point (e.g. reassociate).
> > > 
> > 
> 

> > > > (If we do something like that, we may want to change the
> > > 
> > 
> 
> > > > name from 'fast' to something else (like 'aggr'), to avoid
> > > 
> > 
> 
> > > > tying it too closely to the concept of the '-ffast-math'
> > > 
> > 
> 
> > > > switch.)
> > > 
> > 
> 

> > > > As an aside, I don't know if the "reassociate" example is
> > > 
> > 
> 
> > > > the only other transformation that's allowed by 'fast' (I
> > > 
> > 
> 
> > > > presume it isn't), but I think reassociation would be better
> > > 
> > 
> 
> > > > expressed by a separate flag, which could then be controlled
> > > 
> > 
> 
> > > > independently via '-f[no]-associative-math' switch. Not
> > > 
> > 
> 
> > > > having that flag exist separately in the FMF is the origin
> > > 
> > 
> 
> > > > of PR27372. But creating that flag and using it in the
> > > 
> > 
> 
> > > > appropriate places would still run into these problems of
> > > 
> > 
> 
> > > > 'fast' implying all the others, which would make it
> > > 
> > 
> 
> > > > impossible to disable reassociation while leaving all the
> > > 
> > 
> 
> > > > other FMF transformations enabled.
> > > 
> > 
> 

> > > > To ask a concrete question using the current definition of
> > > 
> > 
> 
> > > > 'fast' (which includes enabling reassociation, as the LLVM
> > > 
> > 
> 
> > > > IR documentation of FMF says), how can we express in the IR
> > > 
> > 
> 
> > > > that reciprocal-transformations are not allowed, but
> > > 
> > 
> 
> > > > reassociation is allowed?
> > > 
> > 
> 

> > > > So the bottom line is that I do see there are issues in
> > > 
> > 
> 
> > > > Clang that are relevant. But as long as 'fast' means
> > > 
> > 
> 
> > > > "'aggr' plus all the other FMF transformations", I don't see
> > > 
> > 
> 
> > > > how we can effectively disable a subset of those other FMF
> > > 
> > 
> 
> > > > transformations (while leaving 'aggr' transformations, such
> > > 
> > 
> 
> > > > as reassociation, enabled). With that in mind, my patch
> > > 
> > 
> 
> > > > took one step in having 'fast' no longer imply all the others.
> > > 
> > 
> 

> > > > Thanks,
> > > 
> > 
> 

> > > > -Warren
> > > 
> > 
> 

> > _______________________________________________
> 
> > LLVM Developers mailing list
> 
> > llvm-dev at lists.llvm.org
> 
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 

Hal Finkel 
Lead, Compiler Technology and Programming Languages 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161118/47032f0e/attachment-0001.html>


More information about the llvm-dev mailing list