[PATCH] D23583: [AArch64] Add feature has-fast-fma

Evandro Menezes via llvm-commits llvm-commits at lists.llvm.org
Thu Aug 18 11:57:22 PDT 2016


evandro added a comment.

In https://reviews.llvm.org/D23583#519658, @jgreenhalgh wrote:

> So the transformation here always results in both an FMUL and an FMA in the instruction stream imagine:
>
>   fmul s0, s0, s0
>   fadd s1, s1, s0
>   str s0, [x0]
>   str s1, [x1]
>   
>
> The transformation above would generate:
>
>   fmul s2, s0, s0
>   fmadd s1, s0, s0, s1
>   str s2, [x0]
>   str s1, [x1]
>   
>
> This is why, for these transforms, the hook talks about the relative cost of an FADD to an FMA.
>
> I would imagine for many cores, including Exynos-M1, this would not be a good transform to enable.


Yes it would for in the former case there's a dependency between FMUL and FADD and, though both instructions use different units, there's a stall.  In the latter case, though both insns use the same unit, they pipeline back to back, since there's no dependency stall.

Therefore, yes, I am confident enabling it for Exynos M1.

> It seems to me that decoupling the bad transforms from the good, enabling the good transforms for most AArch64 CPUs, and leaving the bad transform off would be the more beneficial approach.


Perhaps the transforms could be refined with different cost functions, but it'd be better suited to discuss them when discussing a generalized cost function.


Repository:
  rL LLVM

https://reviews.llvm.org/D23583





More information about the llvm-commits mailing list