[PATCH] D40696: Enable aggressive FMA on T99 and provide AArch64 option for other micro-arch's

Mon Jan 22 09:06:08 PST 2018

fhahn added inline comments.

================
Comment at: lib/Target/AArch64/AArch64.td:152

+def FeatureAggressiveFMAFloat :
+  SubtargetFeature<"aggressive-fma-float",
----------------
steleman wrote:
> fhahn wrote:
> > steleman wrote:
> > > fhahn wrote:
> > > > How likely is it that we aggressive FMA is beneficial for floats or double only? IMO it's probably enough to just have `-aggressive-fma` for now.
> > > 
> > > Could you please clarify this? I.e. are you referring to quad-precision?
> > > 
> > > T99 doesn't have quad-precision FMA. Only floats and doubles. Are there any AArch64 micro-arch'es that can do quad-precision FMA?
> > > 
> > > Or are you asking about SIMD?
> > > 
> > > 
> > Sorry, I meant I think a single option to enable aggressive FMA would be enough and separate options for float and double seems too fine grained, unless I am missing something :)
> Aaah, OK. Got it.
> 
> Here's my thinking:
> 
> There may be some AArch64 micro-arch'es that would want to enable FMA for floats, but not doubles. Or doubles, but not floats. Because of latency, or some other constraint.
> 
> So the idea is to allow this differentiation.
> 
> Now that FMA is a Subtarget feature, if we don't have this differentiation between floats and doubles, then we're back to the original implementation: the micro-arch Subtarget will have to switch() on the CPU and EVT type in AArch64ISelLowering.cpp (AArch64TargetLowering::enableAggressiveFMAFusion()). That was frowned upon, which is why I created these two FMA Subtarget features.
> 
> And then there's also FMLA and FMLS.
> 
> For T99 it doesn't matter because all the FMA instructions have the same latency. But I'm not sure this applies to all the other AArch64 micro-arch'es.
> 
I see what you mean, I just think that we should make our lives more complicated only when there actually are micro-arch'es that want to enable aggressive FMA for double or float only ;)

> Now that FMA is a Subtarget feature, if we don't have this differentiation between floats and doubles, then we're back to the original implementation: the micro-arch Subtarget will have to switch() on the CPU and EVT type in AArch64ISelLowering.cpp (AArch64TargetLowering::enableAggressiveFMAFusion()). That was frowned upon, which is why I created these two FMA Subtarget features.

I am not sure I follow. I think @MatzeB 's issue was using `getProcFamily()` and it should be fine to just check the subtarget feature in enableAggressiveFMAFusion. @MatzeB summarized some benefits of making it a subtarget feature here: https://reviews.llvm.org/D40177#936974

Repository:
  rL LLVM

https://reviews.llvm.org/D40696