[PATCH] D105432: [Analysis] Add simple cost model for strict (in-order) reductions

Wed Jul 14 23:58:33 PDT 2021

dmgreen added a comment.

Thanks for adding the FMF's. This looks useful in the long run to me

================
Comment at: llvm/include/llvm/Analysis/TargetTransformInfo.h:1151
+  /// for a given \p Opcode and set of FastMathFlags \p FPFlags.
+  static bool isOrderedReduction(unsigned Opcode, FastMathFlags FPFlags) {
+    if ((Opcode == Instruction::FAdd || Opcode == Instruction::FMul) &&
----------------
Nit: I believe FMF is the common spelling throughout llvm.
And maybe isOrderedReduction -> requiresOrderedReduction ?

================
Comment at: llvm/include/llvm/Analysis/TargetTransformInfo.h:1163
+  /// FastMathFlags parameter \p FPFlags together indicate what type of
+  /// reduction we are performing:
+  ///   1. Tree-wise. This is the typical 'fast' reduction performed that
----------------
This comment looks useful. I'm wondering if it's worth emphasizing a bit that these are the default lowerings, and the cost should be whatever the fastest way the target can legally lower the intrinsic would be. 
Maybe spelling out that that float operations without Reassoc require ordered reductions that look like `((((init + v0) + v1) + v2) + ..`. And otherwise the reduction can happen in any order, which by default will follow a treewise reduction.

================
Comment at: llvm/include/llvm/CodeGen/BasicTTIImpl.h:1657
       return thisT()->getArithmeticReductionCost(Instruction::Add, VecOpTy,
-                                                 CostKind);
+                                                 FastMathFlags(), CostKind);
     case Intrinsic::vector_reduce_mul:
----------------
Could these still use FMF? It shouldn't matter much I suppose, either way they should be empty.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h:138
+  /// upon the maximum possible value for vscale.
+  unsigned getScalarizationCostFactor(ElementCount VF) const {
+    if (!VF.isScalable())
----------------
getScalarizationCostFactor implies that it will be scalarized by codegen, and sounds similar to the already present getScalarizationOverhead. What do you think about something like getMaxNumLanes, as that appears to be what it computes.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105432/new/

https://reviews.llvm.org/D105432