[llvm] [AArch64][CostModel] Reduce the cost of fadd reduction with fast flag (PR #108791)

Tue Sep 24 00:19:22 PDT 2024

================
@@ -4153,6 +4153,47 @@ AArch64TTIImpl::getArithmeticReductionCost(unsigned Opcode, VectorType *ValTy,
   switch (ISD) {
   default:
     break;
+  case ISD::FADD:
+    if (Type *EltTy = ValTy->getScalarType();
+        // FIXME: We would be restricting the input scalar type to following
+        // types since for some of the types, codegen might be different e.g.
+        // fp128. Also, for half types without fullfp16 support, the cost maybe
+        // still be higher than what is expected from codegen.
----------------
davemgreen wrote:

I believe that fp128 will always want to scalarize, so shouldn't be added here (there is not equivalent faadp instruction). For fp16 we scalarize where we shouldn't, you can see what we should produce in the -global-isel results: https://godbolt.org/z/zPc16Gfv1
Maybe change this to `FIXME: For half types without fullfp16 support, this could extend and use a fp32 faddp reduction but current codegen unrolls.`

https://github.com/llvm/llvm-project/pull/108791