[PATCH] D43414: AMDGPU: Define FP_FAST_FMA{F} macros for amdgcn

Fri Feb 16 16:31:06 PST 2018

t-tye requested changes to this revision.
t-tye added inline comments.
This revision now requires changes to proceed.

================
Comment at: lib/Basic/Targets/AMDGPU.cpp:345-348
+  if (getTriple().getArch() == llvm::Triple::amdgcn) {
+    Builder.defineMacro("FP_FAST_FMA");
+    Builder.defineMacro("FP_FAST_FMAF");
+  }
----------------
t-tye wrote:
> b-sumner wrote:
> > t-tye wrote:
> > > Do all amdgcn targets have fast FMA? @b-sumner can you clarify?
> > No.  All targets that support double precision should report FAST_FMA.  Only targets with full rate v_fma_f32 should report FAST_FMAF
> > 
> It is unfortunate that clang does not have access to the processor features defined in the td files which gives the settings for each target.
Now that the compiler knows the target it seems the clang options that specify fast_fma et al should be removed and the runtimes changed to not set them.

The implementation of when fast fma[f] is present should match the amdgcn td files which have all gfx9 and some pre-gfx9 targets supporting fast fmaf (the ones that have full rate double precision).

https://reviews.llvm.org/D43414