[llvm] [AArch64][CostModel] Reduce the cost of fadd reduction with fast flag (PR #108791)

Tue Sep 17 04:37:19 PDT 2024

================
@@ -4147,6 +4147,22 @@ AArch64TTIImpl::getArithmeticReductionCost(unsigned Opcode, VectorType *ValTy,
   switch (ISD) {
   default:
     break;
+  case ISD::FADD: {
+    if (MTy.isVector()) {
+      // FIXME: Consider cases where the number of vector elements is not power
+      // of 2.
+      const unsigned NElts = MTy.getVectorNumElements();
+      if (ValTy->getElementCount().getFixedValue() >= 2 && NElts >= 2 &&
+          isPowerOf2_32(NElts)) {
----------------
sushgokh wrote:

> Can you check the fp type is one that we would expect. I think the MTy.isVector() is protecting against fp128 but those should have a high cost too. The rule for fp16 should generally be that if +fullfp16 is present then it is cheap, otherwise it needs to extend it to fp32 and reduce that.

I assume you are stating the case where input vector is v8f16 (i.e. fp128).

Isn't taking into account legalization cost here sufficient for the case you have cited?
```
return (LT.first - 1) + /*No of faddp instructions*/ Log2_32(NElts)
```



https://github.com/llvm/llvm-project/pull/108791