[cfe-dev] what does -ffp-contract=fast allow?
Sanjay Patel via cfe-dev
cfe-dev at lists.llvm.org
Fri Nov 18 08:37:08 PST 2016
fp-contract is confusing, so let me try to summarize that and the
underlying implementation:
1. -ffp-contract=on means honor the compiler's default FP_CONTRACT setting
or any FP_CONTRACT pragmas in the source. Currently, clang defaults to
"OFF". The shouting is not an accident; this is not the same as the flag's
"off" setting. This is described nicely here:
https://reviews.llvm.org/D24481
If we set "on" in the invocation *and* we set "ON" in the source, clang
will generate @llvm.fmuladd intrinsics for expressions like x*y+z. If you
split that into 2 lines in C with a temp variable assignment, it's no
longer a single expression, so no FMA for you. The @llvm.fmuladd intrinsic
is our way of preserving the C source information through the optimizer. If
we don't end up producing an FMA instruction for the target in this case,
it's a bug.
2. -ffp-contract=fast means override the compiler's default "OFF" setting
and override source pragmas to generate FMA when possible, even across C
expressions. The "fast" naming is unfortunate because this does *not*
enable most fast-math. Ie, as everyone in this thread agrees so far, we are
not allowed to do the reassociation in the example. It's not strict math
though because of that trailing clause that let's us generate FMA across
expressions.
Here's where it gets more complicated and possibly buggy. Clang does not
generate llvm.fmuladd intrinsics with this setting. In this mode, clang
generates individual fmul and fadd instructions and relies on the backend
to fuse those back together. More background here:
https://llvm.org/bugs/show_bug.cgi?id=17211
I don't know if it's possible, but if we're in this mode and some IR
transform pass managed to move/kill an fmul or fadd that was destined to be
part of an FMA, I think that would be a bug. This mode is also completely
broken with LTO because we're using a TargetOption to communicate the FMA
mode to the backend; there is no instruction-level or function-level
attribute/metadata for FMA-ness:
https://llvm.org/bugs/show_bug.cgi?id=25721
To tie this back to the earlier thread about changes to IR FMF, the
possibility of adding FMA bits to FMF (as well as storing all FMF in
metadata) was discussed here:
https://llvm.org/bugs/show_bug.cgi?id=13118
3. The backend needs a thread of its own. We have at least these mechanisms
to handle FMA codegen:
a. TargetOptions for LessPreciseFPMADOption, UnsafeFPMath, NoInfsFPMath,
NoNaNsFPMath, AllowFPOpFusion (Fast, Standard, Strict)
b. SDNodeFlags for UnsafeAlgebra, NoNaNs, NoInfs, NoSignedZeros (but
nothing for FMA since IR FMF has nothing for FMA)
c. SelectionDAGTargetInfo::generateFMAsInMachineCombiner()
d. TargetLoweringBase::isFMAFasterThanFMulAndFAdd()
e. TargetLoweringBase::enableAggressiveFMAFusion()
f. ISD::FMA (no intermediate rounding step) and ISD::FMAD (has intermediate
rounding) nodes
On Thu, Nov 17, 2016 at 6:03 PM, Finkel, Hal J. <hfinkel at anl.gov> wrote:
> *Sent from my Verizon Wireless 4G LTE DROID*
> *On Nov 17, 2016 5:53 PM, Mehdi Amini <**mehdi.amini at apple.com*
> <mehdi.amini at apple.com>*> wrote:*
> *>*
> *>*
> *>> On Nov 17, 2016, at 4:33 PM, Hal Finkel <**hfinkel at anl.gov*
> <hfinkel at anl.gov>*> wrote:*
> *>>*
> *>>*
> *>> ________________________________*
> *>>>*
> *>>> From: "Warren Ristow" <**warren.ristow at sony.com*
> <warren.ristow at sony.com>*>*
> *>>> To: "Sanjay Patel" <**spatel at rotateright.com*
> <spatel at rotateright.com>*>, "cfe-dev" <**cfe-dev at lists.llvm.org*
> <cfe-dev at lists.llvm.org>*>, "llvm-dev" <**llvm-dev at lists.llvm.org*
> <llvm-dev at lists.llvm.org>*>*
> *>>> Cc: "Nicolai Hähnle" <**nhaehnle at gmail.com* <nhaehnle at gmail.com>*>,
> "Hal Finkel" <**hfinkel at anl.gov* <hfinkel at anl.gov>*>, "Mehdi Amini" <*
> *mehdi.amini at apple.com* <mehdi.amini at apple.com>*>, "andrew kaylor" <*
> *andrew.kaylor at intel.com* <andrew.kaylor at intel.com>*>*
> *>>> Sent: Thursday, November 17, 2016 5:58:58 PM*
> *>>> Subject: RE: what does -ffp-contract=fast allow?*
> *>>>*
> *>>> > Is this a bug? We transformed the original expression into:*
> *>>> > x * y + x*
> *>>>*
> *>>> I’d say yes, it’s a bug.*
> *>>>*
> *>>> *
> *>>>*
> *>>> Unless ‑ffast‑math is used (or some appropriate subset that gives us
> leeway, like ‑fno‑honor‑infinities or ‑fno‑honor‑nans, or somesuch), the
> re-association isn’t allowed, and that blocks the madd contraction.*
> *>>*
> *>> I agree. FP contraction alone only allows us to do x*y+z ->
> fma(x,y,z).*
> *>*
> *>*
> *> I agree too, but the more difficult question is "which flags are needed
> here?”*
> *> Would FPContract + no-inf be enough? If not why and how to document it?*
>
> *I think that the relevant question is: Is the contracted form more
> precise for all inputs (or the same precision as the original)? If so, then
> this should be allowed with just fp-contract+no-inf. Otherwise, more is
> required.*
>
> *-Hal*
>
> *>*
> *>*
> *> — *
> *> Mehdi*
> *>*
> *>*
> *>*
> *>>> *
> *>>>*
> *>>> From: Sanjay Patel [mailto:**spatel at rotateright.com*
> <spatel at rotateright.com>*] *
> *>>> Sent: Thursday, November 17, 2016 3:22 PM*
> *>>> To: cfe-dev <**cfe-dev at lists.llvm.org* <cfe-dev at lists.llvm.org>*>;
> llvm-dev <**llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>*>*
> *>>> Cc: Nicolai Hähnle <**nhaehnle at gmail.com* <nhaehnle at gmail.com>*>;
> Hal Finkel <**hfinkel at anl.gov* <hfinkel at anl.gov>*>; Mehdi Amini <*
> *mehdi.amini at apple.com* <mehdi.amini at apple.com>*>; Ristow, Warren <*
> *warren.ristow at sony.com* <warren.ristow at sony.com>*>; *
> *andrew.kaylor at intel.com* <andrew.kaylor at intel.com>
> *>>> Subject: what does -ffp-contract=fast allow?*
> *>>>*
> *>>> *
> *>>>*
> *>>> This is just paraphrasing from D26602, so credit to Nicolai for first
> raising the issue there.*
> *>>>*
> *>>> float foo(float x, float y) {*
> *>>> return x * (y + 1);*
> *>>> }*
> *>>>*
> *>>> $ ./clang -O2 xy1.c -S -o - -target aarch64 -ffp-contract=fast |
> grep fm*
> *>>> fmadd s0, s1, s0, s0*
> *>>>*
> *>>> Is this a bug? We transformed the original expression into:*
> *>>> x * y + x*
> *>>>*
> *>>> When x=INF and y=0, the code returns INF if we don't reassociate.
> With reassociation to FMA, it returns NAN because 0 * INF = NAN.*
> *>>>*
> *>>> 1. I used aarch64 as the example target, but this is not
> target-dependent (as long as the target has FMA).*
> *>>>*
> *>>> 2. This is *not* -ffast-math...or is it? The C standard only shows
> on/off settings for the associated FP_CONTRACT pragma.*
> *>>>*
> *>>> 3. AFAIK, clang has no documentation for -ffp-contract:*
> *>>> **http://clang.llvm.org/docs/UsersManual.html*
> <http://clang.llvm.org/docs/UsersManual.html>
> *>>>*
> *>>> 4. GCC says:*
> *>>> *
> *https://gcc.gnu.org/onlinedocs/gcc-6.2.0/gcc/Optimize-Options.html#Optimize-Options*
> <https://gcc.gnu.org/onlinedocs/gcc-6.2.0/gcc/Optimize-Options.html#Optimize-Options>
> *>>> "-ffp-contract=fast enables floating-point expression contraction
> such as forming of fused multiply-add operations if the target has native
> support for them."*
> *>>>*
> *>>> 5. The LLVM backend (where this reassociation currently happens)
> shows:*
> *>>> FPOpFusion::Fast - Enable fusion of FP ops wherever it's profitable.*
> *>>*
> *>>*
> *>>*
> *>>*
> *>> -- *
> *>> Hal Finkel*
> *>> Lead, Compiler Technology and Programming Languages*
> *>> Leadership Computing Facility*
> *>> Argonne National Laboratory*
> *>*
> *>*
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20161118/e17f7c06/attachment.html>
More information about the cfe-dev
mailing list