[llvm] [AArch64][CostModel] Reduce the cost of fadd reduction with fast flag (PR #108791)

Tue Sep 17 07:22:25 PDT 2024

================
@@ -4147,6 +4147,22 @@ AArch64TTIImpl::getArithmeticReductionCost(unsigned Opcode, VectorType *ValTy,
   switch (ISD) {
   default:
     break;
+  case ISD::FADD: {
+    if (MTy.isVector()) {
+      // FIXME: Consider cases where the number of vector elements is not power
+      // of 2.
+      const unsigned NElts = MTy.getVectorNumElements();
+      if (ValTy->getElementCount().getFixedValue() >= 2 && NElts >= 2 &&
+          isPowerOf2_32(NElts)) {
----------------
davemgreen wrote:

The cost without fp16:
```
; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %fadd_v8f16 = call fast half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)
```
seems to be the same as with fp16, I would expect it to be higher.
```
; FP16-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %fadd_v8f16 = call fast half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)
```
It might be that the type legalization cost doesn't account for it as the type is still legal to some extent.

https://github.com/llvm/llvm-project/pull/108791