[cfe-dev] what does -ffp-contract=fast allow?

Fri Nov 18 14:53:56 PST 2016

----- Original Message -----

> From: "Sanjay Patel" <spatel at rotateright.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Mehdi Amini" <mehdi.amini at apple.com>, "llvm-dev"
> <llvm-dev at lists.llvm.org>, "cfe-dev" <cfe-dev at lists.llvm.org>,
> "andrew kaylor" <andrew.kaylor at intel.com>, "Nicolai Hähnle"
> <nhaehnle at gmail.com>, "Warren Ristow" <warren.ristow at sony.com>
> Sent: Friday, November 18, 2016 1:35:44 PM
> Subject: Re: what does -ffp-contract=fast allow?

> On Fri, Nov 18, 2016 at 11:19 AM, Hal Finkel < hfinkel at anl.gov >
> wrote:

> > > From: "Sanjay Patel" < spatel at rotateright.com >
> > 
> 
> > > To: "Hal J. Finkel" < hfinkel at anl.gov >
> > 
> 
> > > Cc: "Mehdi Amini" < mehdi.amini at apple.com >, "llvm-dev" <
> > > llvm-dev at lists.llvm.org >, "cfe-dev" < cfe-dev at lists.llvm.org >,
> > > "andrew kaylor" < andrew.kaylor at intel.com >, "Nicolai Hähnle" <
> > > nhaehnle at gmail.com >, "Warren Ristow" < warren.ristow at sony.com >
> > 
> 
> > > Sent: Friday, November 18, 2016 10:37:08 AM
> > 
> 
> > > Subject: Re: what does -ffp-contract=fast allow?
> > 
> 

> > > fp-contract is confusing, so let me try to summarize that and the
> > > underlying implementation:
> > 
> 

> > > 1. -ffp-contract=on means honor the compiler's default
> > > FP_CONTRACT
> > > setting or any FP_CONTRACT pragmas in the source. Currently,
> > > clang
> > > defaults to "OFF". The shouting is not an accident; this is not
> > > the
> > > same as the flag's "off" setting. This is described nicely here:
> > 
> 
> > > https://reviews.llvm.org/D24481
> > 
> 

> > > If we set "on" in the invocation *and* we set "ON" in the source,
> > > clang will generate @llvm.fmuladd intrinsics for expressions like
> > > x*y+z. If you split that into 2 lines in C with a temp variable
> > > assignment, it's no longer a single expression, so no FMA for
> > > you.
> > > The @llvm.fmuladd intrinsic is our way of preserving the C source
> > > information through the optimizer. If we don't end up producing
> > > an
> > > FMA instruction for the target in this case, it's a bug.
> > 
> 

> > This is not correct.
> 

> > First, the behavior of -ffp-contract=on/off should just set the
> > default state of the pragma. Once we finish fixing up the test
> > suite
> > to allow us to actually flip the default, this will actually be the
> > case (the review description referenced above is not clear on the
> > desired end state in this regard). Hopefully, this work will be
> > done
> > soon.
> 

> > Second, it is specifically *not* a bug if @llvm.fmuladd does not
> > become an FMA on the target. It only represents an allowable place
> > to form an FMA. The LangRef specifically states, "Fusion is not
> > guaranteed, even if the target platform supports it." The @llvm.fma
> > intrinsic should become an FMA if the target supports it.
> 
> Ah, I mixed up llvm.fma and llvm.fmuladd. The FP_CONTRACT ON setting
> allows - but does not require - FMA codegen within a C statement. So
> the use of llvm.fmuladd is our way of preserving the C statement
> boundary and is the "blessed" op that the backend recognizes when
> operating in FPOpFusionMode::Standard.

That's correct. 

Thanks again, 
Hal 

> > > 2. -ffp-contract=fast means override the compiler's default "OFF"
> > > setting and override source pragmas to generate FMA when
> > > possible,
> > > even across C expressions. The "fast" naming is unfortunate
> > > because
> > > this does *not* enable most fast-math. Ie, as everyone in this
> > > thread agrees so far, we are not allowed to do the reassociation
> > > in
> > > the example. It's not strict math though because of that trailing
> > > clause that let's us generate FMA across expressions.
> > 
> 

> > > Here's where it gets more complicated and possibly buggy. Clang
> > > does
> > > not generate llvm.fmuladd intrinsics with this setting. In this
> > > mode, clang generates individual fmul and fadd instructions and
> > > relies on the backend to fuse those back together.
> > 
> 
> > This is definitely not a bug. The problem with the C rules for
> > contraction, which only allow fusion within a C-language statement,
> > don't allow fusion opportunities that appear only after function
> > inlining (or, obviously, across statements in any other sense).
> > This
> > is a real problem, especially in C++ code, where there are a lot of
> > small inline functions in abstraction layers that users expect the
> > compiler to see through before deciding on fusion. Even within a
> > function, the fusions allowed by the C rules are not necessarily
> > performance-optimal.
> 

> > > More background here:
> > 
> 
> > > https://llvm.org/bugs/show_bug.cgi?id=17211
> > 
> 

> > > I don't know if it's possible, but if we're in this mode and some
> > > IR
> > > transform pass managed to move/kill an fmul or fadd that was
> > > destined to be part of an FMA, I think that would be a bug.
> > 
> 
> > No, this also would not be a bug (although could be bad for
> > performance on some architectures).
> 

> > > This mode is also completely broken with LTO because we're using
> > > a
> > > TargetOption to communicate the FMA mode to the backend; there is
> > > no
> > > instruction-level or function-level attribute/metadata for
> > > FMA-ness:
> > 
> 
> > > https://llvm.org/bugs/show_bug.cgi?id=25721
> > 
> 

> > Interesting; we should at least have a function-attribute for this
> > that Clang uses.
> 

> > Thanks again,
> 
> > Hal
> 

> > > To tie this back to the earlier thread about changes to IR FMF,
> > > the
> > > possibility of adding FMA bits to FMF (as well as storing all FMF
> > > in
> > > metadata) was discussed here:
> > 
> 
> > > https://llvm.org/bugs/show_bug.cgi?id=13118
> > 
> 

> > > 3. The backend needs a thread of its own. We have at least these
> > > mechanisms to handle FMA codegen:
> > 
> 

> > > a. TargetOptions for LessPreciseFPMADOption, UnsafeFPMath,
> > > NoInfsFPMath, NoNaNsFPMath, AllowFPOpFusion (Fast, Standard,
> > > Strict)
> > 
> 

> > > b. SDNodeFlags for UnsafeAlgebra, NoNaNs, NoInfs, NoSignedZeros
> > > (but
> > > nothing for FMA since IR FMF has nothing for FMA)
> > 
> 

> > > c. SelectionDAGTargetInfo::generateFMAsInMachineCombiner()
> > 
> 

> > > d. TargetLoweringBase::isFMAFasterThanFMulAndFAdd()
> > 
> 

> > > e. TargetLoweringBase::enableAggressiveFMAFusion()
> > 
> 

> > > f. ISD::FMA (no intermediate rounding step) and ISD::FMAD (has
> > > intermediate rounding) nodes
> > 
> 

> > > On Thu, Nov 17, 2016 at 6:03 PM, Finkel, Hal J. < hfinkel at anl.gov
> > > >
> > > wrote:
> > 
> 

> > > > Sent from my Verizon Wireless 4G LTE DROID
> > > 
> > 
> 
> > > > On Nov 17, 2016 5:53 PM, Mehdi Amini < mehdi.amini at apple.com >
> > > > wrote:
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > >> On Nov 17, 2016, at 4:33 PM, Hal Finkel < hfinkel at anl.gov >
> > > > >> wrote:
> > > 
> > 
> 
> > > > >>
> > > 
> > 
> 
> > > > >>
> > > 
> > 
> 
> > > > >> ________________________________
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>> From: "Warren Ristow" < warren.ristow at sony.com >
> > > 
> > 
> 
> > > > >>> To: "Sanjay Patel" < spatel at rotateright.com >, "cfe-dev" <
> > > > >>> cfe-dev at lists.llvm.org >, "llvm-dev" <
> > > > >>> llvm-dev at lists.llvm.org
> > > > >>> >
> > > 
> > 
> 
> > > > >>> Cc: "Nicolai Hähnle" < nhaehnle at gmail.com >, "Hal Finkel" <
> > > > >>> hfinkel at anl.gov >, "Mehdi Amini" < mehdi.amini at apple.com >,
> > > > >>> "andrew kaylor" < andrew.kaylor at intel.com >
> > > 
> > 
> 
> > > > >>> Sent: Thursday, November 17, 2016 5:58:58 PM
> > > 
> > 
> 
> > > > >>> Subject: RE: what does -ffp-contract=fast allow?
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>> > Is this a bug? We transformed the original expression
> > > > >>> > into:
> > > 
> > 
> 
> > > > >>> > x * y + x
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>> I’d say yes, it’s a bug.
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>> Unless ‑ffast‑math is used (or some appropriate subset that
> > > > >>> gives
> > > > >>> us leeway, like ‑fno‑honor‑infinities or ‑fno‑honor‑nans,
> > > > >>> or
> > > > >>> somesuch), the re-association isn’t allowed, and that
> > > > >>> blocks
> > > > >>> the
> > > > >>> madd contraction.
> > > 
> > 
> 
> > > > >>
> > > 
> > 
> 
> > > > >> I agree. FP contraction alone only allows us to do x*y+z ->
> > > > >> fma(x,y,z).
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > > I agree too, but the more difficult question is "which flags
> > > > > are
> > > > > needed here?”
> > > 
> > 
> 
> > > > > Would FPContract + no-inf be enough? If not why and how to
> > > > > document
> > > > > it?
> > > 
> > 
> 

> > > > I think that the relevant question is: Is the contracted form
> > > > more
> > > > precise for all inputs (or the same precision as the original)?
> > > > If
> > > > so, then this should be allowed with just fp-contract+no-inf.
> > > > Otherwise, more is required.
> > > 
> > 
> 

> > > > -Hal
> > > 
> > 
> 

> > > > >
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > > —
> > > 
> > 
> 
> > > > > Mehdi
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>> From: Sanjay Patel [mailto: spatel at rotateright.com ]
> > > 
> > 
> 
> > > > >>> Sent: Thursday, November 17, 2016 3:22 PM
> > > 
> > 
> 
> > > > >>> To: cfe-dev < cfe-dev at lists.llvm.org >; llvm-dev <
> > > > >>> llvm-dev at lists.llvm.org >
> > > 
> > 
> 
> > > > >>> Cc: Nicolai Hähnle < nhaehnle at gmail.com >; Hal Finkel <
> > > > >>> hfinkel at anl.gov >; Mehdi Amini < mehdi.amini at apple.com >;
> > > > >>> Ristow, Warren < warren.ristow at sony.com >;
> > > > >>> andrew.kaylor at intel.com
> > > 
> > 
> 
> > > > >>> Subject: what does -ffp-contract=fast allow?
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>> This is just paraphrasing from D26602, so credit to Nicolai
> > > > >>> for
> > > > >>> first raising the issue there.
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>> float foo(float x, float y) {
> > > 
> > 
> 
> > > > >>> return x * (y + 1);
> > > 
> > 
> 
> > > > >>> }
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>> $ ./clang -O2 xy1.c -S -o - -target aarch64
> > > > >>> -ffp-contract=fast
> > > > >>> |
> > > > >>> grep fm
> > > 
> > 
> 
> > > > >>> fmadd s0, s1, s0, s0
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>> Is this a bug? We transformed the original expression into:
> > > 
> > 
> 
> > > > >>> x * y + x
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>> When x=INF and y=0, the code returns INF if we don't
> > > > >>> reassociate.
> > > > >>> With reassociation to FMA, it returns NAN because 0 * INF =
> > > > >>> NAN.
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>> 1. I used aarch64 as the example target, but this is not
> > > > >>> target-dependent (as long as the target has FMA).
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>> 2. This is *not* -ffast-math...or is it? The C standard
> > > > >>> only
> > > > >>> shows on/off settings for the associated FP_CONTRACT
> > > > >>> pragma.
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>> 3. AFAIK, clang has no documentation for -ffp-contract:
> > > 
> > 
> 
> > > > >>> http://clang.llvm.org/docs/UsersManual.html
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>> 4. GCC says:
> > > 
> > 
> 
> > > > >>> https://gcc.gnu.org/onlinedocs/gcc-6.2.0/gcc/Optimize-Options.html#Optimize-Options
> > > 
> > 
> 
> > > > >>> "-ffp-contract=fast enables floating-point expression
> > > > >>> contraction
> > > > >>> such as forming of fused multiply-add operations if the
> > > > >>> target
> > > > >>> has native support for them."
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>> 5. The LLVM backend (where this reassociation currently
> > > > >>> happens)
> > > > >>> shows:
> > > 
> > 
> 
> > > > >>> FPOpFusion::Fast - Enable fusion of FP ops wherever it's
> > > > >>> profitable.
> > > 
> > 
> 
> > > > >>
> > > 
> > 
> 
> > > > >>
> > > 
> > 
> 
> > > > >>
> > > 
> > 
> 
> > > > >>
> > > 
> > 
> 
> > > > >> --
> > > 
> > 
> 
> > > > >> Hal Finkel
> > > 
> > 
> 
> > > > >> Lead, Compiler Technology and Programming Languages
> > > 
> > 
> 
> > > > >> Leadership Computing Facility
> > > 
> > 
> 
> > > > >> Argonne National Laboratory
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 

> > --
> 

> > Hal Finkel
> 
> > Lead, Compiler Technology and Programming Languages
> 
> > Leadership Computing Facility
> 
> > Argonne National Laboratory
> 

-- 

Hal Finkel 
Lead, Compiler Technology and Programming Languages 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20161118/0b7a3331/attachment.html>