[LLVMdev] Representing -ffast-math at the IR level

Mon Apr 16 11:07:05 PDT 2012

On Mon, 16 Apr 2012 19:40:41 +0200
Duncan Sands <baldrick at free.fr> wrote:

> Hi Owen,
> 
> > I have some issues with representing this as a single "fast" mode
> > flag,
> 
> it isn't a single flag, that's the whole point of using metadata.
> OK, right now there is only one option (the "accuracy"), true, but
> the intent is that others will be added, and the meaning of accuracy
> tightened, later.  MDBuilder has a createFastFPMath method which is
> intended to produce settings that match GCC's -ffast-math, however
> frontends will be able to specify whatever settings they like if that
> doesn't suit them (i.e. createFPMath will get more arguments as more
> settings become available).
> 
> Note that as the current option isn't actually connected to any
> optimizations, there is nothing much to argue about for the moment.
> 
> My plan is to introduce a few simple optimizations (x + 0.0 -> x for
> example) that introduce a finite number of ULPs of error, and hook
> them up.  Thus this does not include things like x * 0.0 -> 0.0
> (infinite ULPs of error), reassociation (infinite ULPs of error) or
> any other scary things.

If I understand what you're saying, I think that saying "infinite ULPs
of error" for x + 0, x*0, etc. is an unhelpful way of classifying
these. These are finite math assumptions, and ULPs of error should only
be computed assuming finite inputs. Accordingly, these are not at all
scary, but can be safely enabled when a finite math assumption is
allowed regardless of other user-required accuracy constraints.

 -Hal

> 
>   which mostly boil down to the fact that this is a very C-centric
> view of the world.  And, since C compilers are not generally known
> for their awesomeness on issues of numerics,  I'm not sure that's a
> good idea.
> > Having something called a "fast" or "relaxed" mode implies that it
> > is less precise than whatever the standard mode is.  However, C is
> > notably sparse in specifying what exactly the standard mode is.
> > The typical assumption is that it is the strict one-to-one
> > translation to IEEE754 semantics, but no optimizing C compiler
> > actually implements that.
> 
> I think this is a misunderstanding of where I'm going, see above.
> 
> > Other languages are more interesting in this regard.  Fortran, for
> > instance, allows reassociation within parentheses.  (Can that even
> > be represented with instruction metadata?)
> 
> I'm aware of Fortran parentheses (PAREN_EXPR in gcc).  If it can't be
> expressed well then too bad: reassociation can just be turned off and
> we won't optimize Fortran as well as we could.  (As mentioned above I
> have no intention of turning on reassociation based on the current
> flag since it can introduce an unbounded number of ULPs of error).
> 
>    OpenCL has a very fairly baseline mode, but specifies a number of
> specific options the user can enable to relax it (-cl-mad-enable,
> -cl-no-signed-zeros, -cl-unsafe-math-optimization (implies the
> previous two), -cl-finite-math-only, -cl-fast-relaxed-math (implies
> all prior)).  GLSL has distinct desktop and embedded specifications
> that place different levels of constraint on implementations.
> 
> Yup.
> 
> >
> > If we define the baseline behavior to be strict IEEE conformance,
> 
> Which we do.
> 
>   and then don't provide a more nuanced method of relaxing it,
> 
> Allowing more nuanced ways is the reason for using metadata as
> explained above.
> 
>   we're not going to be in a significantly better world than we are
> today.  No reasonable implementation of these languages wants strict
> conformance (except maybe desktop-profile OpenCL) as their default
> mode,
> 
> Strict conformance is what they get right now.
> 
>   nor is there any way a universal definition of "fast" math can work
> for all of them.
> 
> I agree, and I'm not trying to provide one.
> 
> Ciao, Duncan.
> 
> >
> > --Owen
> >
> > On Apr 14, 2012, at 11:28 AM, Duncan Sands<baldrick at free.fr>  wrote:
> >
> >> The attached patch is a first attempt at representing
> >> "-ffast-math" at the IR level, in fact on individual floating
> >> point instructions (fadd, fsub etc).  It is done using metadata.
> >> We already have a "fpmath" metadata type which can be used to
> >> signal that reduced precision is OK for a floating point
> >> operation, eg
> >>
> >>     %z = fmul float %x, %y, !fpmath !0
> >>   ...
> >>   !0 = metadata !{double 2.5}
> >>
> >> indicates that the multiplication can be done in any way that
> >> doesn't introduce more than 2.5 ULPs of error.
> >>
> >> The first observation is that !fpmath can be extended with
> >> additional operands in the future: operands that say things like
> >> whether it is OK to assume that there are no NaNs and so forth.
> >>
> >> This patch doesn't add additional operands though.  It just allows
> >> the existing accuracy operand to be the special keyword "fast"
> >> instead of a number:
> >>
> >>     %z = fmul float %x, %y, !fpmath !0
> >>   ...
> >>   !0 = metadata !{!metadata "fast"}
> >>
> >> This indicates that accuracy loss is acceptable (just how much is
> >> unspecified) for the sake of speed.  Thanks to Chandler for
> >> pushing me to do it this way!
> >>
> >> It also creates a simple way of getting and setting this
> >> information: the FPMathOperator class: you can cast appropriate
> >> instructions to this class and then use the querying/mutating
> >> methods to get/set the accuracy, whether 2.5 or "fast".  The
> >> attached clang patch uses this to set the openCL 2.5 ULPs accuracy
> >> rather than doing it by hand for example.
> >>
> >> In addition it changes IRBuilder so that you can provide an
> >> accuracy when creating floating point operations.  I don't like
> >> this so much.  It would be more efficient to just create the
> >> metadata once and then splat it onto each instruction.  Also, if
> >> fpmath gets a bunch more options/operands in the future then this
> >> interface will become more and more awkward.  Opinions welcome!
> >>
> >> I didn't actually implement any optimizations that use this yet.
> >>
> >> I took a look at the impact on aermod.f90, a reasonably floating
> >> point heavy Fortran benchmark (4% of the human readable IR
> >> consists of floating point operations).  At -O3 (the worst), the
> >> size of the bitcode increases by 0.8%. No idea if that's
> >> acceptable - hopefully it is!
> >>
> >> Enjoy!
> >>
> >> Duncan.
> >> <fastm-llvm.diff><fastm-clang.diff>_______________________________________________
> >> LLVM Developers mailing list
> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory