[llvm] [AArch64][CostModel] Reduce the cost of fadd reduction with fast flag (PR #108791)

Thu Sep 19 05:03:33 PDT 2024

================
@@ -4147,6 +4147,22 @@ AArch64TTIImpl::getArithmeticReductionCost(unsigned Opcode, VectorType *ValTy,
   switch (ISD) {
   default:
     break;
+  case ISD::FADD: {
+    if (MTy.isVector()) {
+      // FIXME: Consider cases where the number of vector elements is not power
+      // of 2.
+      const unsigned NElts = MTy.getVectorNumElements();
+      if (ValTy->getElementCount().getFixedValue() >= 2 && NElts >= 2 &&
+          isPowerOf2_32(NElts)) {
----------------
sushgokh wrote:

@davemgreen sorry, got bit confused with the types above(and hence your initial suggestion).

For half type:
I agree `-fullfp16` cost should be higher than `+fullfp16`. But don't you think thats a different issue altogether and needs to be taken differently from this patch?

for fp128 type:
why we should not allow fp128? are there any issues with this type? 
My understanding is fp128 is multiple of fp32 and it also occupies whole register and hence, no legalization cost would be involved. Correct me if wrong.


https://github.com/llvm/llvm-project/pull/108791