[LLVMdev] Representing -ffast-math at the IR level

Sun Apr 15 06:16:48 PDT 2012

On Sun, 15 Apr 2012 02:00:37 +0400
Dmitry Babokin <babokin at gmail.com> wrote:

> On Sun, Apr 15, 2012 at 1:02 AM, Duncan Sands <baldrick at free.fr>
> wrote:
> 
> > Hi Dmitry,
> >
> >
> >     The kinds of transforms I think can reasonably be done with the
> > current
> >>    information are things like: x + 0.0 -> x; x / constant -> x *
> >> (1 / constant) if
> >>    constant and 1 / constant are normal (and not denormal) numbers.
> >>
> >> The particular definition is not that important, as the fact that
> >> this definition exists :) I.e. I think we need a set of
> >> transformations to be defined
> >> (as enum the most likely, as Renato pointed out) and an interface,
> >> which accepts
> >> "fp-model" (which is "fast", "strict" or whatever keyword we may
> >> end up) and the
> >> particular transformation and returns true of false, depending
> >> whether the definition of fp-model allows this transformation or
> >> not. So the transformation
> >> would request, for example, if reassociation is allowed or not.
> >>
> >
> > at some point each optimization will have to decide if it is going
> > to be applied
> > or not, so that's not really the point.  It seems to me that there
> > are many many
> > possible optimizations, and putting them all as flags in the
> > metadata is out of
> > the question.  What seems reasonable to me is dividing transforms
> > up into a few
> > major (and orthogonal) classes and putting flags for them in the
> > metadata.
> >
> > Optimization decision to apply or not should be based on strict
> > definition
> of what is allowed or not, but not on optimization interpretation of
> "fast" fp-model (for example). Say, after widely adopting "fast"
> fp-model in the compiler, you suddenly realize that  the definition
> is wrong and allowing some type of transformation is a bad idea (for
> any reason - being incompatible with some compiler or not taking into
> account some corner cases or for whatever other reason), then you'll
> have to go and fix one million places where the decision is made.
> 
> Alternatively, defining classes of transformation and making
> optimization to query for particular types of transformation you keep
> it under control.
> 
> 
> >  Another point, important from practical point of view, is that
> > fp-model is
> >> almost always the same for any instructions in the function (or
> >> even module) and
> >> tagging every instruction with fp-model metadata is quite a
> >> substantial waste of
> >> resources.
> >>
> >
> > I measured the resource waste and it seems fairly small.
> >
> >
> >  So it makes sense to me to have a default fp-model defined for the
> >
> >> function or module, which can be overwritten with instruction
> >> metadata.
> >>
> >
> > That's possible (I already discussed this with Chandler), but in my
> > opinion is
> > only worth doing if we see unreasonable increases in bitcode size
> > in real code.
> 
> 
> What is reasonable or not is defined not only by absolute numbers
> (0.8% or any other number). Does it make sense to increase bitcode
> size by 1% if it's used only by math library writes and a couple
> other people who reeeeally care about precision *and* performance at
> the same time and knowledgeable enough to restrict precision on
> particular instructions only? In my experience it's extremely rare
> case, when people would like to have more than compiler flags to
> control fp accuracy and ready to deal with pragmas (when they are
> available).
> 
> >
> >
> >  I also understand that clang generally derives GCC switches and fp
> >> precision
> >> switches are not an exception, but I'd like to point out that
> >> there's a far more
> >> orderly way of defining fp precision model (IMHO, of course :-) ),
> >> adopted by MS
> >> and Intel Compiler (-fp-model [strict|precise|fast]). It would be
> >> nice to have
> >> it adopted in clang.
> >>
> >> But while adding MS-style fp-model switches is different topic
> >> (and I guess
> >> quite arguable one), I'm mentioning it to show the importance of
> >> an idea of
> >> abstracting internal compiler fp-model from external switches
> >>
> >
> > The info in the meta-data is essentially a bunch of external
> > switches which will then be used to determine which transforms are
> > run.
> >
> >
> >  and exposing
> >
> >> a querying interface to transformations. Transformations shouldn't
> >> care about
> >> particular model, they need to know only if particular type of
> >> transformation is
> >> allowed.
> >>
> >
> > Do you have a concrete suggestion for what should be in the
> > metadata?
> >
> 
> I would define the set of transformations, such as (i can help with
> more complete list if you prefer):
> 
>    - reassociation
>    - x+0.0=>x
>    - x*0.0=>0.0
>    - x*1.0=>x
>    - a/b => a* 1/b
>    - a*b+c=>fma(a,b,c)
>    - ignoring NaNs in compare, i.e. (a<b) => !(a>=b)
>    - value unsafe transformation (for aggressive fp optimizations,
> like a*b+a*c => a(b+c)) and other of the kind.
> 
> and several aliases for "strict", "precise", "fast" models (which are
> effectively combination of flags above).

>From a user's perspective, I think that it is important to have
categories defining:
 - finite math (as precise as normal, but might do odd things for NaNs
   or Infty, etc.) - I'd suppose this is a strictest "fast" option.
 - algebraic-equivalence - The compiler might do anything that is
   algebraically the same (even if the numerics could be quite
   different) - This is probably the loosest "fast" option.

 -Hal

> 
> So that metadata would be able to say "fast", "fast, but no fma
> allowed", "strict, but fma allowed", I.e. metadata should be a
> base-level + optional set of adjustments from the list above.
> 
> And, again, I think this should be function level model, unless
> specified otherwise in the instruction, as it will be the case in
> 99.9999% of the compilations.
> 
> >
> > Ciao, Duncan.
> >
> 
> Dmitry.


-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory