[PATCH] D105432: [Analysis] Add simple cost model for strict (in-order) reductions

Tue Jul 6 06:54:23 PDT 2021

dmgreen added inline comments.

================
Comment at: llvm/include/llvm/CodeGen/BasicTTIImpl.h:2085
+  InstructionCost
+  getArithmeticStrictReductionCost(unsigned Opcode, VectorType *Ty,
+                                   TTI::TargetCostKind CostKind) {
----------------
david-arm wrote:
> dmgreen wrote:
> > sdesmalen wrote:
> > > Can you unify the interface for getArithmeticReductionCost to take an enum which specifies the kind of reduction we want: tree, pair-wise or in-order?
> > > If there is only a single `getReductionCost` interface where the parameters dictate having to specify what cost is being asked for, than that's difficult to get wrong (you need to fill in the parameters), but if there's another slightly more specialized variant available, people may not realise it exists and just call the most basic cost interface instead.
> > Oh you got here first.  :)
> > 
> > I think we can possibly remove IsPairwise entirely. It seems to be only used in a single place nowadays that doesn't seem to me like it would be giving very good cost estimates.
> Hi @dmgreen, it turns out removing the pairwise form breaks *lots* of tests. :) I think it's still needed because `TTI::matchVectorReduction` returns the pairwise form for a lot of cases.
I happened to be looking at if we can remove it too. Are you assuming that IsPairwise = false? The SLP vectorizer works differently now to how it did in the past.

I can put a patch for that if you like. There is one cost test that changes, but it seems to me like they would be OK to change, and the code creating the costs removed.

It is at least a (mostly) separate issue from this patch.

================
Comment at: llvm/include/llvm/CodeGen/BasicTTIImpl.h:2087
+                                   TTI::TargetCostKind CostKind) {
+    auto *VTy = cast<FixedVectorType>(Ty);
+    InstructionCost Cost =
----------------
david-arm wrote:
> dmgreen wrote:
> > Should this be checking for scalable vectors?
> I don't think we can easily calculate a cost for scalable vectors here and is very target-specific. At the moment I have modelled this on the worst case, i.e. scalarising the operation, and for scalable vectors we don't know the number of elements. The approach taken here is similar to functions elsewhere in this file, e.g. getCommonMaskedMemoryOpCost. I think if we get here for scalable vectors it's actually a bug, since the target should really have dealt with this separately. Any thoughts @sdesmalen 
Could it just return an invalid cost?

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h:136-137
+  /// when scalarizing an operation for a vector with ElementCount \p VF.
+  /// For scalable vectors this currently takes the most pessimistic view based
+  /// upon the maximum possible value for vscale.
+  unsigned getScalarizationCostFactor(ElementCount VF) const {
----------------
david-arm wrote:
> dmgreen wrote:
> > I had assumed (without thinking about it very much) that the costs for VF arguments would be based either the exact value of VF from the -mcpu argument if one is provided. If it is not then we would guess at a value, probably VF=2 would be a sensible default. This is using the maximum possible VF, which sounds like a large over-estimate in most cases.
> > 
> > Can you speak to why the max VF is a better value to use?
> > 
> > I'm not sure I understand why this is scalarizing though.
> Hi @dmgreen, yeah we are aware of this problem. It's not ideal - at the moment we also do this for gather-scatters too. We took the decision to be conservative and use the maximum value of vscale as the worst case scenario. In practice, the runtime value could vary from machine to machine and we thought it better to wait a while and revisit this again at some point. In fact, that's partly why I created this function so that we only have to change one place in future. :)
> 
> I also think that always choosing the most optimistic case could lead to poor performance so we have to be careful. One option we have is to use the new vscale range IR attributes to refine this, or choose a value of vscale that represents some sort of average of real use cases?
> 
OK - more work is needed. Got it. I would have expected these cost factors to come from the subtarget, not an IR attribute.

What is being scalarized here though? From https://godbolt.org/z/fcz71dPeY for example? Some of the Illegal types were hitting errors.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105432/new/

https://reviews.llvm.org/D105432