[PATCH] D90781: [ARM] remove cost-kind predicate for cmp/sel costs

Thu Nov 5 06:32:29 PST 2020

spatel added inline comments.

================
Comment at: llvm/test/Analysis/CostModel/ARM/intrinsic-cost-kinds.ll:220
 ; SIZE_LATE-LABEL: 'reduce_fmax'
-; SIZE_LATE-NEXT:  Cost Model: Found an estimated cost of 620 for instruction: %v = call float @llvm.vector.reduce.fmax.v16f32(<16 x float> %va)
+; SIZE_LATE-NEXT:  Cost Model: Found an estimated cost of 628 for instruction: %v = call float @llvm.vector.reduce.fmax.v16f32(<16 x float> %va)
 ; SIZE_LATE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: ret void
----------------
samparker wrote:
> I know this is a tiny change, but it's drawn my attention because it's so high the number is so high. Does this look right to you @dmgreen ? I would have thought we were able to break this up more efficiently with our native support, or is this because we'd have to copy the GPR registers into an FPRs to perform some final scalar maxs..?
Stepping through this, the cost is derived from the BasicTTIImpl calling back to the target for shuffle+cmp+sel:

```
      // Assume the pairwise shuffles add a cost.
      ShuffleCost +=
          (IsPairwise + 1) * thisT()->getShuffleCost(TTI::SK_ExtractSubvector,
                                                     Ty, NumVecElts, SubTy);
      MinMaxCost +=
          thisT()->getCmpSelInstrCost(CmpOpcode, SubTy, CondTy,
                                      CmpInst::BAD_ICMP_PREDICATE, CostKind) +
          thisT()->getCmpSelInstrCost(Instruction::Select, SubTy, CondTy,
                                      CmpInst::BAD_ICMP_PREDICATE, CostKind);

```

And this progresses for v16f32 as: 384 -> 480 -> 608 for the shuffles and 8 -> 12 -> 20 for the cmp/sel, so 608 + 20 = 628.
The shuffle cost seems to be expanded as a series of insert/extract based on the number of elements in the vector, so it's exploding.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90781/new/

https://reviews.llvm.org/D90781