[llvm-dev] RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags

Tue Nov 22 20:05:58 PST 2016

(I wrote this last week but didn’t send it somehow)

> On Nov 16, 2016, at 1:42 PM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
> 
> On 16.11.2016 18:03, Mehdi Amini via llvm-dev wrote:
>> Individual flags for various optimization classes make sense only if you
>> don’t end up with a lot of very specialized new flags.
>> If a single “reassociate” flag could be enough to complete the existing
>> and replace the “fast” that would be great.
>> But some auditing of all the users of “fast" would be needed first. For
>> instance is "X * log2(0.5*Y) = X*log2(Y) - X” covered by
>> “reassociation”? That seems a bit more than what people think about with
>> reassociation at first.
> 
> Why are there individual flags for different types of optimization _at all_?
> 
> The way I see it, if you started from a blank slate, then the flags should define what the application cares about. In other words, the following four flags would do:
> 
> nnan, ninf, nsz, inexact
> 
> The first three have the same meaning as today. The last one is "Allow algebraically equivalent transformations that may change the result due to different rounding only".
> 
> This includes what is today arcp, fp-contract=fast, and most of "unsafe math", but it *doesn't* by itself allow you to ignore inf or nan. I'm not sure about nsz; it seems like that could be implied by inexact.
> 
> I'd be really curious to know if there is anybody who really needs arcp without fp-contract=fast or vice versa, or who needs both of these but not the X*log2(0.5*Y) transform you mentioned, and so on.[1]

I remember a use case for having arcp without reassociation (the opposite of Warren's case).
This is on HW where there is no division, only reciprocal instruction.
A division like a/b can only be implemented as a * 1 / b. However in IEEE compliant mode, we also need some Newton-Raphson computation to recover the precision. Also, a/sqrt(b) can be computed directly as `a*rsqrt(b)`.
For most users, it was perfectly fine to avoid the Newton-Raphson iteration or use the rsqrt directly, but not enabling full reassociation. Enabling only reciprocal but not the full `fast` was useful: only adjacent reciprocal were considered, avoiding that the precision would drop drastically (as it would with fast).

— 
Mehdi

> 
> One could consider expressing inexact in terms of ulp, but since errors can accumulate through multiple transforms I doubt that that's too useful in general. (We do have a use case for ulp in AMDGPU since we want fast inexact reciprocals for some applications. We currently use metadata for that, and I see no need to change that.)
> 
> Cheers,
> Nicolai
> 
> [1] One case I _can_ think of (and which may have been a reason for the proliferation of flags in the first place) is somebody who enables fast math, but then doesn't want their results to change when they update the compiler and get a new set of optimizations. But IMO that's a use case that should be explicitly rejected.
> 
>> 
>> —
>> Mehdi
>> 
>>> 
>>> 
>>> 
>>>        With
>>>        that proposed approach, rather than an "umbrella" flag such as
>>>        'fast' being
>>>        checked in the back-end (along with an individual flag like
>>>        'arcp'), just
>>>        checking the individual flag ('arcp') would be done.
>>> 
>>> 
>>>    There is already no need to check the “fast” *and* arcp flag, if a
>>>    transformation is about reciprocal, then you only need to check
>>>    arcp (fast implies arcp, checking for fast would be redundant).
>>> 
>>>    Be careful also that the fast-math flags are mainly an IR level
>>>    definition, the backend only inherited these per instruction flag
>>>    very recently. It has been entirely converted to use these, and it
>>>    still uses a global flag in some places.
>>>    The line you’re touching in your patch for instance is about this
>>>    legacy:
>>> 
>>>      if (!UnsafeMath && !Flags->hasAllowReciprocal())
>>> 
>>>    The first flag is the global “fast-math” mode on the backend,
>>>    which is not as fine grain as the per-instruction model.
>>>    The second flag is the “per instruction” flag, which is the model
>>>    we aim at.
>>> 
>>>    We should get rid of the “global” UnsafeMath in the backend, but
>>>    that does not require any change to the IR or the individual
>>>    fast-math flags.
>>> 
>>> 
>>>        Any fast-math-related
>>>        transformation that doesn't have an individual flag (e.g.,
>>>        re-association
>>>        currently doesn't), should eventually have an individual flag
>>>        defined for
>>>        it, and then that individual flag should be checked.
>>> 
>>>        What do people think?
>>> 
>>> 
>>>    I think these are valuable problems to solve, but you should
>>>    tackle them piece by piece:
>>> 
>>>    1) the clang part of overriding the individual FMF and emitting
>>>    the right IR is the first thing to fix.
>>>    2) the backend is still using the global UnsafeFPMath and it
>>>    should be killed.
>>> 
>>>    Hope this makes sense.
>>> 
>>>    —
>>>    Mehdi
>>> 
>>> 
>>>    _______________________________________________
>>>    LLVM Developers mailing list
>>>    llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>>    http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> 
>>> 
>>> 
>>> 
>>> --
>>> Hal Finkel
>>> Lead, Compiler Technology and Programming Languages
>>> Leadership Computing Facility
>>> Argonne National Laboratory
>> 
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>