[cfe-dev] what does -ffp-contract=fast allow?

Fri Nov 18 11:35:44 PST 2016

On Fri, Nov 18, 2016 at 11:19 AM, Hal Finkel <hfinkel at anl.gov> wrote:

>
> ------------------------------
>
> *From: *"Sanjay Patel" <spatel at rotateright.com>
> *To: *"Hal J. Finkel" <hfinkel at anl.gov>
> *Cc: *"Mehdi Amini" <mehdi.amini at apple.com>, "llvm-dev" <
> llvm-dev at lists.llvm.org>, "cfe-dev" <cfe-dev at lists.llvm.org>, "andrew
> kaylor" <andrew.kaylor at intel.com>, "Nicolai Hähnle" <nhaehnle at gmail.com>,
> "Warren Ristow" <warren.ristow at sony.com>
> *Sent: *Friday, November 18, 2016 10:37:08 AM
> *Subject: *Re: what does -ffp-contract=fast allow?
>
> fp-contract is confusing, so let me try to summarize that and the
> underlying implementation:
>
> 1. -ffp-contract=on means honor the compiler's default FP_CONTRACT setting
> or any FP_CONTRACT pragmas in the source. Currently, clang defaults to
> "OFF". The shouting is not an accident; this is not the same as the flag's
> "off" setting. This is described nicely here:
> https://reviews.llvm.org/D24481
>
> If we set "on" in the invocation *and* we set "ON" in the source, clang
> will generate @llvm.fmuladd intrinsics for expressions like x*y+z. If you
> split that into 2 lines in C with a temp variable assignment, it's no
> longer a single expression, so no FMA for you. The @llvm.fmuladd intrinsic
> is our way of preserving the C source information through the optimizer. If
> we don't end up producing an FMA instruction for the target in this case,
> it's a bug.
>
> This is not correct.
>
> First, the behavior of -ffp-contract=on/off should just set the default
> state of the pragma. Once we finish fixing up the test suite to allow us to
> actually flip the default, this will actually be the case (the review
> description referenced above is not clear on the desired end state in this
> regard). Hopefully, this work will be done soon.
>
> Second, it is specifically *not* a bug if @llvm.fmuladd does not become an
> FMA on the target. It only represents an allowable place to form an FMA.
> The LangRef specifically states, "Fusion is not guaranteed, even if the
> target platform supports it." The @llvm.fma intrinsic should become an FMA
> if the target supports it.
>

Ah, I mixed up llvm.fma and llvm.fmuladd. The FP_CONTRACT ON setting allows
- but does not require - FMA codegen within a C statement. So the use of
llvm.fmuladd is our way of preserving the C statement boundary and is the
"blessed" op that the backend recognizes when operating in
FPOpFusionMode::Standard.

>
>
> 2. -ffp-contract=fast means override the compiler's default "OFF" setting
> and override source pragmas to generate FMA when possible, even across C
> expressions. The "fast" naming is unfortunate because this does *not*
> enable most fast-math. Ie, as everyone in this thread agrees so far, we are
> not allowed to do the reassociation in the example. It's not strict math
> though because of that trailing clause that let's us generate FMA across
> expressions.
>
> Here's where it gets more complicated and possibly buggy. Clang does not
> generate llvm.fmuladd intrinsics with this setting. In this mode, clang
> generates individual fmul and fadd instructions and relies on the backend
> to fuse those back together.
>
> This is definitely not a bug. The problem with the C rules for
> contraction, which only allow fusion within a C-language statement, don't
> allow fusion opportunities that appear only after function inlining (or,
> obviously, across statements in any other sense). This is a real problem,
> especially in C++ code, where there are a lot of small inline functions in
> abstraction layers that users expect the compiler to see through before
> deciding on fusion. Even within a function, the fusions allowed by the C
> rules are not necessarily performance-optimal.
>
> More background here:
> https://llvm.org/bugs/show_bug.cgi?id=17211
>
> I don't know if it's possible, but if we're in this mode and some IR
> transform pass managed to move/kill an fmul or fadd that was destined to be
> part of an FMA, I think that would be a bug.
>
> No, this also would not be a bug (although could be bad for performance on
> some architectures).
>
> This mode is also completely broken with LTO because we're using a
> TargetOption to communicate the FMA mode to the backend; there is no
> instruction-level or function-level attribute/metadata for FMA-ness:
> https://llvm.org/bugs/show_bug.cgi?id=25721
>
> Interesting; we should at least have a function-attribute for this that
> Clang uses.
>
> Thanks again,
> Hal
>
> To tie this back to the earlier thread about changes to IR FMF, the
> possibility of adding FMA bits to FMF (as well as storing all FMF in
> metadata) was discussed here:
> https://llvm.org/bugs/show_bug.cgi?id=13118
>
> 3. The backend needs a thread of its own. We have at least these
> mechanisms to handle FMA codegen:
> a. TargetOptions for LessPreciseFPMADOption, UnsafeFPMath, NoInfsFPMath,
> NoNaNsFPMath, AllowFPOpFusion (Fast, Standard, Strict)
> b. SDNodeFlags for UnsafeAlgebra, NoNaNs, NoInfs, NoSignedZeros (but
> nothing for FMA since IR FMF has nothing for FMA)
> c. SelectionDAGTargetInfo::generateFMAsInMachineCombiner()
> d. TargetLoweringBase::isFMAFasterThanFMulAndFAdd()
> e. TargetLoweringBase::enableAggressiveFMAFusion()
> f. ISD::FMA (no intermediate rounding step) and ISD::FMAD (has
> intermediate rounding) nodes
>
>
> On Thu, Nov 17, 2016 at 6:03 PM, Finkel, Hal J. <hfinkel at anl.gov> wrote:
>
>> *Sent from my Verizon Wireless 4G LTE DROID*
>> *On Nov 17, 2016 5:53 PM, Mehdi Amini <**mehdi.amini at apple.com*
>> <mehdi.amini at apple.com>*> wrote:*
>> *>*
>> *>*
>> *>> On Nov 17, 2016, at 4:33 PM, Hal Finkel <**hfinkel at anl.gov*
>> <hfinkel at anl.gov>*> wrote:*
>> *>>*
>> *>>*
>> *>> ________________________________*
>> *>>>*
>> *>>> From: "Warren Ristow" <**warren.ristow at sony.com*
>> <warren.ristow at sony.com>*>*
>> *>>> To: "Sanjay Patel" <**spatel at rotateright.com*
>> <spatel at rotateright.com>*>, "cfe-dev" <**cfe-dev at lists.llvm.org*
>> <cfe-dev at lists.llvm.org>*>, "llvm-dev" <**llvm-dev at lists.llvm.org*
>> <llvm-dev at lists.llvm.org>*>*
>> *>>> Cc: "Nicolai Hähnle" <**nhaehnle at gmail.com* <nhaehnle at gmail.com>*>,
>> "Hal Finkel" <**hfinkel at anl.gov* <hfinkel at anl.gov>*>, "Mehdi Amini" <*
>> *mehdi.amini at apple.com* <mehdi.amini at apple.com>*>, "andrew kaylor" <*
>> *andrew.kaylor at intel.com* <andrew.kaylor at intel.com>*>*
>> *>>> Sent: Thursday, November 17, 2016 5:58:58 PM*
>> *>>> Subject: RE: what does -ffp-contract=fast allow?*
>> *>>>*
>> *>>> > Is this a bug? We transformed the original expression into:*
>> *>>> > x * y + x*
>> *>>>*
>> *>>> I’d say yes, it’s a bug.*
>> *>>>*
>> *>>>  *
>> *>>>*
>> *>>> Unless ‑ffast‑math is used (or some appropriate subset that gives us
>> leeway, like ‑fno‑honor‑infinities or ‑fno‑honor‑nans, or somesuch), the
>> re-association isn’t allowed, and that blocks the madd contraction.*
>> *>>*
>> *>> I agree. FP contraction alone only allows us to do x*y+z ->
>> fma(x,y,z).*
>> *>*
>> *>*
>> *> I agree too, but the more difficult question is "which flags are
>> needed here?”*
>> *> Would FPContract + no-inf be enough? If not why and how to document
>> it?*
>>
>> *I think that the relevant question is: Is the contracted form more
>> precise for all inputs (or the same precision as the original)? If so, then
>> this should be allowed with just fp-contract+no-inf. Otherwise, more is
>> required.*
>>
>> *-Hal*
>>
>> *>*
>> *>*
>> *> — *
>> *> Mehdi*
>> *>*
>> *>*
>> *>*
>> *>>>  *
>> *>>>*
>> *>>> From: Sanjay Patel [mailto:**spatel at rotateright.com*
>> <spatel at rotateright.com>*] *
>> *>>> Sent: Thursday, November 17, 2016 3:22 PM*
>> *>>> To: cfe-dev <**cfe-dev at lists.llvm.org* <cfe-dev at lists.llvm.org>*>;
>> llvm-dev <**llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>*>*
>> *>>> Cc: Nicolai Hähnle <**nhaehnle at gmail.com* <nhaehnle at gmail.com>*>;
>> Hal Finkel <**hfinkel at anl.gov* <hfinkel at anl.gov>*>; Mehdi Amini <*
>> *mehdi.amini at apple.com* <mehdi.amini at apple.com>*>; Ristow, Warren <*
>> *warren.ristow at sony.com* <warren.ristow at sony.com>*>; *
>> *andrew.kaylor at intel.com* <andrew.kaylor at intel.com>
>> *>>> Subject: what does -ffp-contract=fast allow?*
>> *>>>*
>> *>>>  *
>> *>>>*
>> *>>> This is just paraphrasing from D26602, so credit to Nicolai for
>> first raising the issue there.*
>> *>>>*
>> *>>> float foo(float x, float y) {*
>> *>>>   return x * (y + 1);*
>> *>>> }*
>> *>>>*
>> *>>> $ ./clang -O2 xy1.c -S -o - -target aarch64  -ffp-contract=fast |
>> grep fm*
>> *>>>     fmadd    s0, s1, s0, s0*
>> *>>>*
>> *>>> Is this a bug? We transformed the original expression into:*
>> *>>> x * y + x*
>> *>>>*
>> *>>> When x=INF and y=0, the code returns INF if we don't reassociate.
>> With reassociation to FMA, it returns NAN because 0 * INF = NAN.*
>> *>>>*
>> *>>> 1. I used aarch64 as the example target, but this is not
>> target-dependent (as long as the target has FMA).*
>> *>>>*
>> *>>> 2. This is *not* -ffast-math...or is it? The C standard only shows
>> on/off settings for the associated FP_CONTRACT pragma.*
>> *>>>*
>> *>>> 3. AFAIK, clang has no documentation for -ffp-contract:*
>> *>>> **http://clang.llvm.org/docs/UsersManual.html*
>> <http://clang.llvm.org/docs/UsersManual.html>
>> *>>>*
>> *>>> 4. GCC says:*
>> *>>> *
>> *https://gcc.gnu.org/onlinedocs/gcc-6.2.0/gcc/Optimize-Options.html#Optimize-Options*
>> <https://gcc.gnu.org/onlinedocs/gcc-6.2.0/gcc/Optimize-Options.html#Optimize-Options>
>> *>>> "-ffp-contract=fast enables floating-point expression contraction
>> such as forming of fused multiply-add operations if the target has native
>> support for them."*
>> *>>>*
>> *>>> 5. The LLVM backend (where this reassociation currently happens)
>> shows:*
>> *>>> FPOpFusion::Fast - Enable fusion of FP ops wherever it's profitable.*
>> *>>*
>> *>>*
>> *>>*
>> *>>*
>> *>> -- *
>> *>> Hal Finkel*
>> *>> Lead, Compiler Technology and Programming Languages*
>> *>> Leadership Computing Facility*
>> *>> Argonne National Laboratory*
>> *>*
>> *>*
>>
>>
>
>
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20161118/261d3801/attachment.html>