[cfe-dev] question about fused multiply add and Clang GNU modes

Hal Finkel via cfe-dev cfe-dev at lists.llvm.org
Sun Sep 20 06:29:13 PDT 2015


----- Original Message -----

> From: "Stephen Canon via cfe-dev" <cfe-dev at lists.llvm.org>
> To: "Ana Pazos" <apazos at codeaurora.org>
> Cc: cfe-dev at lists.llvm.org
> Sent: Saturday, September 19, 2015 3:00:53 PM
> Subject: Re: [cfe-dev] question about fused multiply add and Clang
> GNU modes

> Hi Ana,

> It would change the behavior of a lot of existing software in subtle
> ways to let -std=gnu11 license fp-contract=fast. I’m honestly rather
> surprised that GCC made that choice.

> I’m not sure what you mean by "We know the instruction produces
> results with higher precision and compliant to IEEE 754 standard.”
> FMA produces *different* results than FMUL + FADD, but they are not
> always more accurate. The classical example of naive FMA formation
> gone wrong is multiplying a complex number by its conjugate. The
> imaginary part *should* be zero, but when FMA formation is licensed,
> one generally gets a small non-zero imaginary part.

> IEEE doesn’t actually license fma formation. I’m not sure where you
> got the idea that it does. It doesn’t expressly forbid it either.
> Rather it makes the following recommendations:

> "A language standard should require that by default, when no
> optimizations are enabled and no alternate exception handling is
> enabled, language implementations preserve the literal meaning of
> the source code.”
> This means that by--default--an implementation should not transform
> FMUL + FADD into FMADD. It encourages this transform to be available
> as an option, however:

> "A language standard should also define, and require implementations
> to provide, attributes that allow and disallow value-changing
> optimizations, separately or collectively, for a block. These
> optimizations might include, but are not limited to:

> ― Applying the associative or distributive laws.
> ― Synthesis of a fusedMultiplyAdd operation from a multiplication and
> an addition.
> ― Synthesis of a formatOf operation from an operation and a
> conversion of the result of the operation.
> ― Use of wider intermediate results in expression evaluation."

> Note that the other transforms that IEEE-754 groups in with FMA
> formation here are all things that we license only under fast-math.

> Now, it so happens that fma formation makes results more accurate
> more often than it makes them less accurate. It is *usually* a good
> thing, so the case isn’t quite a cut and dry as I’m presenting it to
> be. It’s also quite beneficial for performance on many platforms
> (but rather detrimental to performance on some other platforms with
> hardware FMA support, so again the case is not terribly clear).

> It should also be noted that -ffp-contract=fast goes beyond what is
> allowed by the C rules for #pragma STDC FP_CONTRACT ON (which allows
> fma formation only within an expression):

> [... ]

> Now, it *does* appear to me that we do not default to having STDC
> FP_CONTRACT ON, which is inhibiting fma formation *even within an
> expression*. Given that we support STDC FP_CONTRACT OFF, we could
> certainly choose to make ON the default, and I would encourage doing
> so.

I agree. Also, our behavior here is appears somewhat buggy. Not only do we not set -ffp-contract=on by default (as I recall had been our intention), but -ffp-contract=on does not even work correctly. The code in lib/Frontend/CompilerInvocation.cpp does call Opts.setFPContractMode(CodeGenOptions::FPC_On) when passed -ffp-contract=on, but only in OpenCL mode do we set Opts.DefaultFPContract = 1. Setting CodeGenOptions::FPC_On does pass the right flag to to the backend, and does enable generating @llvm.fmuladd when an operation is tagged as 'FPContractable', but... 

1. The STDC FP_CONTRACT pragma's DEFAULT option always resets to getLangOpts().DefaultFPContract, and thus is unaffected by the -ffp-contract flag (because that's always 0 except in OpenCL mode). 

2. FPFeatures.fp_contract is initialized to 0 in include/clang/Basic/LangOptions.h, and this is never changed (except by the STDC FP_CONTRACT pragma handlers). When we create BinaryOperator AST nodes (etc.) we use the current state of FPFeatures.fp_contract to set the node's FPContractable flag, and because this always defaults to 0, regardless of how -ffp-contract is set (except setting it to fast which bypasses all of this), none of the AST nodes are marked as contractible, and we don't generate FMAs at all. 

I think that the first step here is fixing all of this so that -ffp-contract=on actually works. 

-Hal 

> – Steve

> On Sep 18, 2015, at 4:52 PM, Ana Pazos via cfe-dev <
> cfe-dev at lists.llvm.org > wrote:

> Hi folks,

> GNU GCC allows fused multiply add instruction generation in –std=gnu*
> modes (default mode in GCC) on both ARM 32-bit and 64-bit targets.
> See outputs below.

> Clang 3.8 defaults to gnu11 for C programs, according to
> http://clang.llvm.org/docs/UsersManual.html#c-language-features and
> function CompilerInvocation::setLangDefaults in
> ./lib/Frontend/CompilerInvocation.cpp in the Clang source code.

> So why fp-contract=fast is not made default in Clang as it is done in
> GNU GCC?

> Just trying to understand the rationale behind this decision. We know
> the instruction produces results with higher precision and compliant
> to IEEE 754 standard.

> This difference in default behavior in Clang/LLVM compared to GNU GCC
> is a performance disadvantage.

> Thanks!
> Ana.

> $ cat t.c
> double f(double a, double b)
> {
> return b*b+a*a;
> }

> $
> gcc-linaro-4.9-2015.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc
> -S -O3 -o- -std=c99 t.c
> .cpu generic+fp+simd
> .file "t.c"
> .text
> .align 2
> .global f
> .type f, %function
> f:
> fmul d1, d1, d1
> fmul d0, d0, d0
> fadd d0, d1, d0
> ret
> .size f, .-f
> .ident "GCC: (Linaro GCC 4.9-2015.05) 4.9.3 20150413 (prerelease)"
> .section .note.GNU-stack,"",%progbits
> $
> gcc-linaro-4.9-2015.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc
> -S -O3 -o- -std=gnu99 t.c
> .cpu generic+fp+simd
> .file "t.c"
> .text
> .align 2
> .global f
> .type f, %function
> f:
> fmul d0, d0, d0
> fmadd d0, d1, d1, d0
> ret
> .size f, .-f
> .ident "GCC: (Linaro GCC 4.9-2015.05) 4.9.3 20150413 (prerelease)"
> .section .note.GNU-stack,"",%progbits
> $
> gcc-linaro-4.9-2015.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc
> -S -O3 -o- t.c
> .cpu generic+fp+simd
> .file "t.c"
> .text
> .align 2
> .global f
> .type f, %function
> f:
> fmul d0, d0, d0
> fmadd d0, d1, d1, d0
> ret
> .size f, .-f
> .ident "GCC: (Linaro GCC 4.9-2015.05) 4.9.3 20150413 (prerelease)"
> .section .note.GNU-stack,"",%progbits

> Ana Pazos
> Qualcomm Innovation Center, Inc.
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
> Forum,
> a Linux Foundation Collaborative Project.

> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

> --
Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150920/e4a82d89/attachment.html>


More information about the cfe-dev mailing list