[PATCH] D111630: [LoopVectorize][CostModel] Update cost model for fmuladd intrinsic

Mon Oct 18 01:17:34 PDT 2021

david-arm added inline comments.

================
Comment at: llvm/include/llvm/CodeGen/BasicTTIImpl.h:2177-2185
+  InstructionCost getFMulAddReductionCost(VectorType *Ty,
+                                          Optional<FastMathFlags> FMF,
+                                          TTI::TargetCostKind CostKind) {
+    InstructionCost FAddReductionCost = thisT()->getArithmeticReductionCost(
+        Instruction::FAdd, Ty, FMF, CostKind);
+    InstructionCost FMulCost =
+        thisT()->getArithmeticInstrCost(Instruction::FMul, Ty, CostKind);
----------------
RosieSumpter wrote:
> paulwalker-arm wrote:
> > Do we need a new TTI interface for this?  To my mind the costing side of TTI exists to cost real entities and in this instance the IR has no discrete concept for an FMA reduction.
> > 
> > Instead what we have is LoopVectorize pretending such a concept exists that is knows ahead of time it will simulate with separate fmul and ordered_fadd_reduce operations.  For this reason I think it would be better to explicitly cost that exact idiom within the same code that is using it.
> > 
> > So I guess I'm saying that if you move these three lines into LoopVectorize.cpp the patch can be much smaller and you're not creating a new interface for something that doesn't really exist.
> Hi Paul, thanks for having a look at this. Initially I did put the cost calculation into the vectorizer, but there was some discussion (also a comment from @fhahn) about whether it would make more sense in ##TTI.getArithmeticReductionCost## and, to avoid changing the interface for ##getArithmeticReductionCost##, @david-arm and @sdesmalen suggested adding the new ##getFMulAddReductionCost##. I am happy to change it back to LoopVectorize.cpp if that seems like the better option.
For what it's worth, I personally think it makes a bit more sense to calculate the costs separately in the vectoriser because in my mind at least the fmul isn't part of the reduction, since we are always going to widen it into a normal vector operation. I suggested adding a new interface as a compromise, so we can avoid over-complicating (and avoid making this patch significantly larger) the existing getArithmeticReductionCost method. I wouldn't block the patch over this though!

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111630/new/

https://reviews.llvm.org/D111630