[PATCH] D20341: [CUDA] Enable fusing FP ops for CUDA by default.
Artem Belevich via cfe-commits
cfe-commits at lists.llvm.org
Wed May 18 10:49:16 PDT 2016
tra added a comment.
I don't think using FMA throws away IEEE compliance.
IEEE 784-2008 says:
> A language standard should also define, and require implementations to provide, attributes that allow and
> disallow value-changing optimizations, separately or collectively, for a block. These optimizations might
> include, but are not limited to:
> ...
> ― Synthesis of a fusedMultiplyAdd operation from a multiplication and an addition
It sounds like FMA use is up to user/language and IEEE standard is fine with it either way.
We need to establish what is the language standard that we need to adhere to. C++ standard itself does not seem to say much about FP precision or particular FP format.
C11 standard (ISO/IEC 9899:201x draft, 7.12.2) says:
> The default state (‘‘on’’ or ‘‘off’’) for the [FP_CONTRACT] pragma is implementation-defined.
Nvidia has fairly detailed description of their FP.
http://docs.nvidia.com/cuda/floating-point/index.html#fused-multiply-add-fma
> The fused multiply-add operator on the GPU has high performance and increases the accuracy of computations. **No special flags or function calls are needed to gain this benefit in CUDA programs**. Understand that a hardware fused multiply-add operation is not yet available on the CPU, which can cause differences in numerical results.
At the moment it's the most specific guideline I managed to find regarding expected FP behavior applicable to CUDA.
http://reviews.llvm.org/D20341
More information about the cfe-commits
mailing list