[PATCH] D31801: Performance enhancements for Cavium ThunderX2 T99

Tue Apr 11 14:55:19 PDT 2017

steleman added inline comments.

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:7624
+  case AArch64Subtarget::ThunderX2T99:
+  case AArch64Subtarget::ThunderXT88:
+    if (VT.isFloatingPoint() && VT.isVector())
----------------
t.p.northover wrote:
> steleman wrote:
> > kristof.beyls wrote:
> > > steleman wrote:
> > > > rengolin wrote:
> > > > > This smells like a target feature. :)
> > > > I can definitely re-write it as a TargetFeature.
> > > > 
> > > my 2 cents, hoping someone with more experience in this area can help further:
> > > 
> > > IIUC, the effect of this change is that also for vector types, this function now returns true, when targeting ThunderX2T99 or ThunderXT88?
> > > 
> > > I can see that for different cores/micro-architectures a different response could be wanted here, also depending on the types involved.
> > > Since the answer could be different dependent on the type, I'm guessing that this won't map nicely to a target feature.
> > > Maybe this information needs to become a subtarget hook somehow?
> > > 
> > > Also, please upload the next iteration of this patch with lots of context (e.g. git diff -U9999) - that makes reviewing it a bit easier.
> > > 
> > > 
> > > IIUC, the effect of this change is that also for vector types, this function now returns true, when targeting ThunderX2T99 or ThunderXT88?
> > 
> > Correct, for floating-point vector types.
> > 
> > > Since the answer could be different dependent on the type, I'm guessing that this won't map nicely to a target feature.
> > 
> > > Since the answer could be different dependent on the type, I'm guessing that this won't map nicely to a target feature.
> > > Maybe this information needs to become a subtarget hook somehow?
> > 
> > I think it depends on
> > - how many other micro-architectures (besides Cavium T88/T99) want/need this sub-feature
> > - how many different types would need different behaviors
> > 
> > So, in my mind, this comes down to where exactly does this complexity reside: in this function, or in a separate hook someplace else.
> > 
> > I don't have a clear answer to this question because I don't know what other micro-architectures need or want with respect to FMA.
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > IIUC, the effect of this change is that also for vector types, this function now returns true, when targeting ThunderX2T99 or ThunderXT88?
> 
> Doesn't getScalarType already cover that? I was trying to work out what this exactly did yesterday and thought it was **excluding scalars**, but I now see that's not true either.
> 
> I strongly suspect it's a NOP (except for f16 CodeGen).
> 
> I strongly suspect it's a NOP (except for f16 CodeGen).

I am not so sure it's a NOP considering that the expression

> if (VT.isFloatingPoint() && VT.isVector())

will evaluate to true - for the case of a floating-point vector type.

What I am unclear about is: why do I need to bother knowing the scalar type when I already know that a floating-point vector type will always be faster using FMA than by using MUL + ADD -- at least for the T88 and T99 micro-architectures.

Repository:
  rL LLVM

https://reviews.llvm.org/D31801