<html><head><style type='text/css'>p { margin: 0; }</style></head><body><div style='font-family: arial,helvetica,sans-serif; font-size: 10pt; color: #000000'><hr id="zwchr"><blockquote id="DWT3197"> From: "Stephen Canon via cfe-dev" <cfe-dev@lists.llvm.org><br> To: "Ana Pazos" <apazos@codeaurora.org><br> Cc: cfe-dev@lists.llvm.org<br> Sent: Saturday, September 19, 2015 3:00:53 PM<br> Subject: Re: [cfe-dev] question about fused multiply add and Clang GNU modes<br> <br><br> Hi Ana,<br> <br> It would change the behavior of a lot of existing software in subtle<br> ways to let -std=gnu11 license fp-contract=fast. I’m honestly rather<br> surprised that GCC made that choice.<br> <br> I’m not sure what you mean by "We know the instruction produces<br> results with higher precision and compliant to IEEE 754 standard.”<br> FMA produces *different* results than FMUL + FADD, but they are not<br> always more accurate. The classical example of naive FMA formation<br> gone wrong is multiplying a complex number by its conjugate. The<br> imaginary part *should* be zero, but when FMA formation is licensed,<br> one generally gets a small non-zero imaginary part.<br> <br> IEEE doesn’t actually license fma formation. I’m not sure where you<br> got the idea that it does. It doesn’t expressly forbid it either.<br> Rather it makes the following recommendations:<br> <br><br> "A language standard should require that by default, when no<br> optimizations are enabled and no alternate exception handling is<br> enabled, language implementations preserve the literal meaning of<br> the source code.”<br> This means that by--default--an implementation should not transform<br> FMUL + FADD into FMADD. It encourages this transform to be available<br> as an option, however:<br> <br><br> "A language standard should also define, and require implementations<br> to provide, attributes that allow and disallow value-changing<br> optimizations, separately or collectively, for a block. These<br> optimizations might include, but are not limited to:<br> <br> <br> ― Applying the associative or distributive laws.<br> ― Synthesis of a fusedMultiplyAdd operation from a multiplication and<br> an addition.<br> ― Synthesis of a formatOf operation from an operation and a<br> conversion of the result of the operation.<br> ― Use of wider intermediate results in expression evaluation."<br> <br> <br> Note that the other transforms that IEEE-754 groups in with FMA<br> formation here are all things that we license only under fast-math.<br> <br> Now, it so happens that fma formation makes results more accurate<br> more often than it makes them less accurate. It is *usually* a good<br> thing, so the case isn’t quite a cut and dry as I’m presenting it to<br> be. It’s also quite beneficial for performance on many platforms<br> (but rather detrimental to performance on some other platforms with<br> hardware FMA support, so again the case is not terribly clear).<br> <br> <br> It should also be noted that -ffp-contract=fast goes beyond what is<br> allowed by the C rules for #pragma STDC FP_CONTRACT ON (which allows<br> fma formation only within an expression):<br> <br>[...]<br> <br> Now, it *does* appear to me that we do not default to having STDC<br> FP_CONTRACT ON, which is inhibiting fma formation *even within an<br> expression*. Given that we support STDC FP_CONTRACT OFF, we could<br> certainly choose to make ON the default, and I would encourage doing<br> so.<br></blockquote><br>I agree. Also, our behavior here is appears somewhat buggy. Not only do we not set -ffp-contract=on by default (as I recall had been our intention), but -ffp-contract=on does not even work correctly. The code in lib/Frontend/CompilerInvocation.cpp does call Opts.setFPContractMode(CodeGenOptions::FPC_On) when passed -ffp-contract=on, but only in OpenCL mode do we set Opts.DefaultFPContract = 1. Setting CodeGenOptions::FPC_On does pass the right flag to to the backend, and does enable generating @llvm.fmuladd when an operation is tagged as 'FPContractable', but...<br><br>  1. The STDC FP_CONTRACT pragma's DEFAULT option always resets to getLangOpts().DefaultFPContract, and thus is unaffected by the -ffp-contract flag (because that's always 0 except in OpenCL mode).<br><br>  2. FPFeatures.fp_contract is initialized to 0 in include/clang/Basic/LangOptions.h, and this is never changed (except by the STDC FP_CONTRACT pragma handlers). When we create BinaryOperator AST nodes (etc.) we use the current state of FPFeatures.fp_contract to set the node's FPContractable flag, and because this always defaults to 0, regardless of how -ffp-contract is set (except setting it to fast which bypasses all of this), none of the AST nodes are marked as contractible, and we don't generate FMAs at all.<br><br>I think that the first step here is fixing all of this so that -ffp-contract=on actually works.<br><br> -Hal<br><br><blockquote> <br> <br> – Steve<br> <br> <br> <br> On Sep 18, 2015, at 4:52 PM, Ana Pazos via cfe-dev <<br> cfe-dev@lists.llvm.org > wrote:<br> <br> Hi folks,<br> <br> GNU GCC allows fused multiply add instruction generation in –std=gnu*<br> modes (default mode in GCC) on both ARM 32-bit and 64-bit targets.<br> See outputs below.<br> <br> Clang 3.8 defaults to gnu11 for C programs, according to<br> http://clang.llvm.org/docs/UsersManual.html#c-language-features and<br> function CompilerInvocation::setLangDefaults in<br> ./lib/Frontend/CompilerInvocation.cpp in the Clang source code.<br> <br> So why fp-contract=fast is not made default in Clang as it is done in<br> GNU GCC?<br> <br> Just trying to understand the rationale behind this decision. We know<br> the instruction produces results with higher precision and compliant<br> to IEEE 754 standard.<br> <br> This difference in default behavior in Clang/LLVM compared to GNU GCC<br> is a performance disadvantage.<br> <br> Thanks!<br> Ana.<br> <br> $ cat t.c<br> double f(double a, double b)<br> {<br> return b*b+a*a;<br> }<br> <br> $<br> gcc-linaro-4.9-2015.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc<br> -S -O3 -o- -std=c99 t.c<br> .cpu generic+fp+simd<br> .file "t.c"<br> .text<br> .align 2<br> .global f<br> .type f, %function<br> f:<br> fmul d1, d1, d1<br> fmul d0, d0, d0<br> fadd d0, d1, d0<br> ret<br> .size f, .-f<br> .ident "GCC: (Linaro GCC 4.9-2015.05) 4.9.3 20150413 (prerelease)"<br> .section .note.GNU-stack,"",%progbits<br> $<br> gcc-linaro-4.9-2015.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc<br> -S -O3 -o- -std=gnu99 t.c<br> .cpu generic+fp+simd<br> .file "t.c"<br> .text<br> .align 2<br> .global f<br> .type f, %function<br> f:<br> fmul d0, d0, d0<br> fmadd d0, d1, d1, d0<br> ret<br> .size f, .-f<br> .ident "GCC: (Linaro GCC 4.9-2015.05) 4.9.3 20150413 (prerelease)"<br> .section .note.GNU-stack,"",%progbits<br> $<br> gcc-linaro-4.9-2015.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc<br> -S -O3 -o- t.c<br> .cpu generic+fp+simd<br> .file "t.c"<br> .text<br> .align 2<br> .global f<br> .type f, %function<br> f:<br> fmul d0, d0, d0<br> fmadd d0, d1, d1, d0<br> ret<br> .size f, .-f<br> .ident "GCC: (Linaro GCC 4.9-2015.05) 4.9.3 20150413 (prerelease)"<br> .section .note.GNU-stack,"",%progbits<br> <br> <br> Ana Pazos<br> Qualcomm Innovation Center, Inc.<br> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora<br> Forum,<br> a Linux Foundation Collaborative Project.<br> <br> <br> <br> _______________________________________________<br> cfe-dev mailing list<br> cfe-dev@lists.llvm.org<br> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev<br> <br> <br> _______________________________________________<br> cfe-dev mailing list<br> cfe-dev@lists.llvm.org<br> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev<br> <br><br>-- </blockquote><br>Hal Finkel<br>Assistant Computational Scientist<br>Leadership Computing Facility<br>Argonne National Laboratory<br></div></body></html>