[PATCH] D20341: [CUDA] Enable fusing FP ops for CUDA by default.

Wed May 18 10:49:16 PDT 2016

tra added a comment.

I don't think using FMA throws away IEEE compliance.

IEEE 784-2008 says:

> A language standard should also define, and require implementations to provide, attributes that allow and

>  disallow value-changing optimizations, separately or collectively, for a block. These optimizations might

>  include, but are not limited to:

>  ...

>  ― Synthesis of a fusedMultiplyAdd operation from a multiplication and an addition

It sounds like FMA use is up to user/language and IEEE standard is fine with it either way.

We need to establish what is the language standard that we need to adhere to. C++ standard itself does not seem to say much about FP precision or particular FP format.

C11 standard (ISO/IEC 9899:201x draft, 7.12.2) says:

> The default state (‘‘on’’ or ‘‘off’’) for the [FP_CONTRACT] pragma is implementation-defined.

Nvidia has fairly detailed description of their FP.
http://docs.nvidia.com/cuda/floating-point/index.html#fused-multiply-add-fma

> The fused multiply-add operator on the GPU has high performance and increases the accuracy of computations. **No special flags or function calls are needed to gain this benefit in CUDA programs**. Understand that a hardware fused multiply-add operation is not yet available on the CPU, which can cause differences in numerical results.

At the moment it's the most specific guideline I managed to find regarding expected FP behavior applicable to CUDA.

http://reviews.llvm.org/D20341