[cfe-dev] question about fused multiply add and Clang GNU modes

Sat Sep 19 13:00:53 PDT 2015

Hi Ana,

It would change the behavior of a lot of existing software in subtle ways to let -std=gnu11 license fp-contract=fast.  I’m honestly rather surprised that GCC made that choice.

I’m not sure what you mean by "We know the instruction produces results with higher precision and compliant to IEEE 754 standard.”  FMA produces *different* results than FMUL + FADD, but they are not always more accurate.  The classical example of naive FMA formation gone wrong is multiplying a complex number by its conjugate.  The imaginary part *should* be zero, but when FMA formation is licensed, one generally gets a small non-zero imaginary part.

IEEE doesn’t actually license fma formation.  I’m not sure where you got the idea that it does.  It doesn’t expressly forbid it either.  Rather it makes the following recommendations:

"A language standard should require that by default, when no optimizations are enabled and no alternate exception handling is enabled, language implementations preserve the literal meaning of the source code.”

This means that by--default--an implementation should not transform FMUL + FADD into FMADD.  It encourages this transform to be available as an option, however:

"A language standard should also define, and require implementations to provide, attributes that allow and disallow value-changing optimizations, separately or collectively, for a block. These optimizations might include, but are not limited to:

―  Applying the associative or distributive laws.
―  Synthesis of a fusedMultiplyAdd operation from a multiplication and an addition. 
―  Synthesis of a formatOf operation from an operation and a conversion of the result of the operation.
―  Use of wider intermediate results in expression evaluation."

Note that the other transforms that IEEE-754 groups in with FMA formation here are all things that we license only under fast-math.

Now, it so happens that fma formation makes results more accurate more often than it makes them less accurate.  It is *usually* a good thing, so the case isn’t quite a cut and dry as I’m presenting it to be.  It’s also quite beneficial for performance on many platforms (but rather detrimental to performance on some other platforms with hardware FMA support, so again the case is not terribly clear).

It should also be noted that -ffp-contract=fast goes beyond what is allowed by the C rules for #pragma STDC FP_CONTRACT ON (which allows fma formation only within an expression):

scanon$ cat foo.c
#pragma STDC FP_CONTRACT ON

float foo(float x, float y, float z) {
  return x*y + z; // fma formation is licensed here.
}

float bar(float x, float y, float z) {
  float p = x*y;
  return p + z;	// fma formation is not licensed here.
}
scanon$ clang fma.c -Os -c -arch arm64 && otool -tvV fma.o
fma.o:
(__TEXT,__text) section
_foo:
0000000000000000	fmadd	s0, s0, s1, s2 // fma only where licensed
0000000000000004	ret
_bar:
0000000000000008	fmul	s0, s0, s1
000000000000000c	fadd	s0, s0, s2
0000000000000010	ret
scanon$ clang fma.c -Os -c -arch arm64 -ffp-contract=fast && otool -tvV fma.o
fma.o:
(__TEXT,__text) section
_foo:
0000000000000000	fmadd	s0, s0, s1, s2
0000000000000004	ret
_bar:
0000000000000008	fmadd	s0, s0, s1, s2 // fma even where not licensed
000000000000000c	ret

Now, it *does* appear to me that we do not default to having STDC FP_CONTRACT ON, which is inhibiting fma formation *even within an expression*.  Given that we support STDC FP_CONTRACT OFF, we could certainly choose to make ON the default, and I would encourage doing so.

– Steve

> On Sep 18, 2015, at 4:52 PM, Ana Pazos via cfe-dev <cfe-dev at lists.llvm.org> wrote:
> 
> Hi folks,
>  
> GNU GCC allows fused multiply add instruction generation in –std=gnu* modes (default  mode in GCC) on both ARM 32-bit and 64-bit targets. See outputs below.
>  
> Clang 3.8 defaults to gnu11 for C programs, according to http://clang.llvm.org/docs/UsersManual.html#c-language-features  and function  CompilerInvocation::setLangDefaults in ./lib/Frontend/CompilerInvocation.cpp in the Clang source code.
>  
> So why  fp-contract=fast is not made default in Clang as it is done in GNU GCC?
>  
> Just trying to understand the rationale behind this decision. We know the instruction  produces results with higher precision and compliant to IEEE 754 standard.
>  
> This difference in default behavior in Clang/LLVM compared to GNU GCC is a performance disadvantage.
>  
> Thanks!
> Ana.
>  
> $ cat t.c
> double f(double a, double b)
> {
> return b*b+a*a;
> }
>  
> $ gcc-linaro-4.9-2015.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc -S -O3 -o- -std=c99 t.c
>         .cpu generic+fp+simd
>         .file   "t.c"
>         .text
>         .align  2
>         .global f
>         .type   f, %function
> f:
>         fmul    d1, d1, d1
>         fmul    d0, d0, d0
>         fadd    d0, d1, d0
>         ret
>         .size   f, .-f
>         .ident  "GCC: (Linaro GCC 4.9-2015.05) 4.9.3 20150413 (prerelease)"
>         .section        .note.GNU-stack,"",%progbits
> $ gcc-linaro-4.9-2015.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc -S -O3 -o- -std=gnu99 t.c
>         .cpu generic+fp+simd
>         .file   "t.c"
>         .text
>         .align  2
>         .global f
>         .type   f, %function
> f:
>         fmul    d0, d0, d0
>         fmadd   d0, d1, d1, d0
>         ret
>         .size   f, .-f
>         .ident  "GCC: (Linaro GCC 4.9-2015.05) 4.9.3 20150413 (prerelease)"
>         .section        .note.GNU-stack,"",%progbits
> $ gcc-linaro-4.9-2015.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc -S -O3 -o-  t.c
>         .cpu generic+fp+simd
>         .file   "t.c"
>         .text
>         .align  2
>         .global f
>         .type   f, %function
> f:
>         fmul    d0, d0, d0
>         fmadd   d0, d1, d1, d0
>         ret
>         .size   f, .-f
>         .ident  "GCC: (Linaro GCC 4.9-2015.05) 4.9.3 20150413 (prerelease)"
>         .section        .note.GNU-stack,"",%progbits
>  
>  
> Ana Pazos
> Qualcomm Innovation Center, Inc.
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, 
> a Linux Foundation Collaborative Project.
> 
>  
>  
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150919/c55c8f75/attachment.html>