[PATCH] D93639: [AArch64][SVE]Add cost model for vector reduce for scalable vector

Wed Jan 6 00:25:40 PST 2021

david-arm added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1107
+                                                bool IsPairwise,
+                                                bool IsUnsigned,
+                                                TTI::TargetCostKind CostKind) {
----------------
sdesmalen wrote:
> My understanding is that the PairWise form is not something that can be expressed with the LLVM IR intrinsic and is therefore specific to fixed-width vectors which can use `shufflevector`.
> Perhaps you can just assert IsPairWise is false. @david-arm any thoughts?
I think we can probably assert IsPairwiseForm=false for now.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1114
+                                         CostKind);
+  // TODO: Return invalid cost instead of assert when getMaxVScale is None
+  assert(getMaxVScale() && "Expected valid max vscale value");
----------------
Remove TODO here as max vscale should always be a valid value for AArch64 + scalable types?

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1117
+
+  unsigned MaxNumVScale = getMaxVScale().getValue();
+  auto LT = TLI->getTypeLegalizationCost(DL, Ty);
----------------
sdesmalen wrote:
> nit: `s/getMaxVScale().getValue()/*getMaxVScale()/`
I don't think we need to use max vscale at all here to be honest. I've discussed this with Carol privately and getTypeLegalizationCost below already returns the number of parts needed (LT.first) along with the MVT for the legal vector type (LT.second). Since we have instructions that do both min/max operations (SMIN,etc.) as well as the horizontal vector reductions for min/max (SMINV, etc.) we can just do something similar to what X86 did and have a very simple cost lookup table for the legal horizontal reduction (SMINV), then add on a legalisation cost. i.e. something like:

unsigned LegalizationCost = 0;
if (not legal) {
  // reduce to a single vector, i.e.
  // res0 = SMIN part0, part1
  // res1 = SMIN res0, part2
  // ...
  LegalizationCost = getMinMaxCost(); // SMIN, SMAX, etc.
  LegalizationCost *= LT.first - 1;
}

// do cost lookup for SMINV, etc.
return LegalizationCost + LegalReductionCost;

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1121
+  unsigned MVTLen = MaxNumVScale * LVF.getKnownMinValue();
+  unsigned NumReduxLevels = Log2_32(MVTLen);
+  unsigned CmpOpcode;
----------------
As mentioned above, I think we can avoid using max vscale and calculating redux levels.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1135
+                         CmpInst::BAD_ICMP_PREDICATE, CostKind);
+  return LT.first * NumReduxLevels * MinMaxCost;
+}
----------------
sdesmalen wrote:
> I think this should be:
> ```Log2(LT.first) * MinMaxVectorCost + NumReduxLevels * MinMaxScalarCost```
> 
> Because the number of legalized vectors will do min/max operations amongst the vectors (to end up with a single vector), before it will do the min/max reduction on the elements within the vector.
> That also means distinguishing the cost for CmpSel on vectors and CmpSel on scalars.
Similar to my comment above about max vscale, I think we can do this in a much simpler way. Rather than try to calculate a complex cost in this way using selects and compares, etc., I think we can just do it the same way that X86 does. We can have a simple cost lookup table for legal reductions (SMIN/MAX,UMIN/MAX, etc.), then add on a legalisation factor.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1141
+    TTI::TargetCostKind CostKind) {
+  // TODO: Return invalid cost instead of assert when getMaxVScale is None
+  assert(getMaxVScale() && "Expected valid max vscale value");
----------------
Remove TODO?

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1153
+
 int AArch64TTIImpl::getArithmeticReductionCost(unsigned Opcode,
                                                VectorType *ValTy,
----------------
Same comments as for the MinMax case. We can avoid all reference to max vscale and redux levels, by treating this as the summation of:

LegalizationCost (sequence of FADD,ADD,etc. to reduce to single vector) + LegalReductionCost (single horizontal reduction instruction, e.g. FADDV).

I think the only case we don't support here is mul reductions, but we can just put in a high cost estimate for now, right?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93639/new/

https://reviews.llvm.org/D93639