[llvm] [AArch64][CostModel] Reduce the cost of fadd reduction with fast flag (PR #108791)

Thu Sep 19 06:51:49 PDT 2024

================
@@ -4147,6 +4147,22 @@ AArch64TTIImpl::getArithmeticReductionCost(unsigned Opcode, VectorType *ValTy,
   switch (ISD) {
   default:
     break;
+  case ISD::FADD: {
+    if (MTy.isVector()) {
+      // FIXME: Consider cases where the number of vector elements is not power
+      // of 2.
+      const unsigned NElts = MTy.getVectorNumElements();
+      if (ValTy->getElementCount().getFixedValue() >= 2 && NElts >= 2 &&
+          isPowerOf2_32(NElts)) {
----------------
davemgreen wrote:

Hi. fp128 will be a series on soft-fp calls to runtime functions, so are expected to be pretty slow. You are right that there wouldn't necessarily be scalaization overhead, but I don't think there would be the same sort of pairwise instructions we are expecting here.

For the fp16 costs, I think it is important in a patch altering the fp16 costs that we get them (roughly) correct, and hopefully doing so isn't too difficult. Falling back to the old cost is fine for -fullfp16, as the instruction this is trying to target (faddp) doesn't exist.

https://github.com/llvm/llvm-project/pull/108791