[PATCH] D31801: Performance enhancements for Cavium ThunderX2 T99

Fri Apr 28 02:44:06 PDT 2017

rengolin added inline comments.

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:7624
+  case AArch64Subtarget::ThunderX2T99:
+  case AArch64Subtarget::ThunderXT88:
+    if (VT.isFloatingPoint() && VT.isVector())
----------------
t.p.northover wrote:
> steleman wrote:
> > t.p.northover wrote:
> > > steleman wrote:
> > > > kristof.beyls wrote:
> > > > > steleman wrote:
> > > > > > rengolin wrote:
> > > > > > > This smells like a target feature. :)
> > > > > > I can definitely re-write it as a TargetFeature.
> > > > > > 
> > > > > my 2 cents, hoping someone with more experience in this area can help further:
> > > > > 
> > > > > IIUC, the effect of this change is that also for vector types, this function now returns true, when targeting ThunderX2T99 or ThunderXT88?
> > > > > 
> > > > > I can see that for different cores/micro-architectures a different response could be wanted here, also depending on the types involved.
> > > > > Since the answer could be different dependent on the type, I'm guessing that this won't map nicely to a target feature.
> > > > > Maybe this information needs to become a subtarget hook somehow?
> > > > > 
> > > > > Also, please upload the next iteration of this patch with lots of context (e.g. git diff -U9999) - that makes reviewing it a bit easier.
> > > > > 
> > > > > 
> > > > > IIUC, the effect of this change is that also for vector types, this function now returns true, when targeting ThunderX2T99 or ThunderXT88?
> > > > 
> > > > Correct, for floating-point vector types.
> > > > 
> > > > > Since the answer could be different dependent on the type, I'm guessing that this won't map nicely to a target feature.
> > > > 
> > > > > Since the answer could be different dependent on the type, I'm guessing that this won't map nicely to a target feature.
> > > > > Maybe this information needs to become a subtarget hook somehow?
> > > > 
> > > > I think it depends on
> > > > - how many other micro-architectures (besides Cavium T88/T99) want/need this sub-feature
> > > > - how many different types would need different behaviors
> > > > 
> > > > So, in my mind, this comes down to where exactly does this complexity reside: in this function, or in a separate hook someplace else.
> > > > 
> > > > I don't have a clear answer to this question because I don't know what other micro-architectures need or want with respect to FMA.
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > IIUC, the effect of this change is that also for vector types, this function now returns true, when targeting ThunderX2T99 or ThunderXT88?
> > > 
> > > Doesn't getScalarType already cover that? I was trying to work out what this exactly did yesterday and thought it was **excluding scalars**, but I now see that's not true either.
> > > 
> > > I strongly suspect it's a NOP (except for f16 CodeGen).
> > > 
> > > I strongly suspect it's a NOP (except for f16 CodeGen).
> > 
> > I am not so sure it's a NOP considering that the expression
> > 
> > > if (VT.isFloatingPoint() && VT.isVector())
> > 
> > will evaluate to true - for the case of a floating-point vector type.
> > 
> > What I am unclear about is: why do I need to bother knowing the scalar type when I already know that a floating-point vector type will always be faster using FMA than by using MUL + ADD -- at least for the T88 and T99 micro-architectures.
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > `if (VT.isFloatingPoint() && VT.isVector())` will evaluate to true - for the case of a floating-point vector type.
> 
> Yes, but we already get true. If the input type is v4f32 (say) then `VT = VT.getScalarType()` returns f32 and the existing switch returns true.
Tim is right, this is a NOP. This function already returns true for both scalar and vector f32/f64.

Repository:
  rL LLVM

https://reviews.llvm.org/D31801