[PATCH] D93639: [AArch64][SVE]Add cost model for vector reduce for scalable vector

Fri Jan 8 03:21:37 PST 2021

sdesmalen added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1121-1123
+  if (LT.first > 1 &&
+      MTy.getVectorMinNumElements() <
+          ValVTy->getElementCount().getKnownMinValue()) {
----------------
I don't think this condition is necessary because LT.first > 1 already implies splitting is required.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1124
+          ValVTy->getElementCount().getKnownMinValue()) {
+    Ty = VectorType::get(ValVTy->getElementType(), MTy.getVectorElementCount());
+    auto *SubCondTy =
----------------
You can use `EVT::getTypeForEVT` instead.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1147-1148
+  switch (ISD) {
+  default:
+    return LegalizationCost + 16;
+  case ISD::UMIN:
----------------
I don't know what default could be here other than `llvm_unreachable`. And if that is unreachable, then you can just as well return the final cost on line 1137.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1166-1168
+  if (LT.first > 1 &&
+      MTy.getVectorMinNumElements() <
+          ValVTy->getElementCount().getKnownMinValue()) {
----------------
same as above, I think this is implied by LT.first > 1.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1169-1170
+          ValVTy->getElementCount().getKnownMinValue()) {
+    auto *SingleOpTy =
+        VectorType::get(ValVTy->getElementType(), MTy.getVectorElementCount());
+    LegalizationCost = getArithmeticInstrCost(Opcode, SingleOpTy, CostKind);
----------------
Use `EVT::getTypeForEVT` for LT.second ?

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1176
+  int ISD = TLI->InstructionOpcodeToISD(Opcode);
+  assert(ISD && "Invalid opcode");
+  // Add the final reduction cost for the legal horizontal reduction
----------------
move this to the `default` in switch below?

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1189
+  case ISD::MUL:
+    return LegalizationCost + 16;
+  }
----------------
For Min/Max you use a fixed-cost of `1` and here you use `2` and `16`.  Does that mean Min/Max should also be 2?
And should 16 actually be Invalid when this function will use InstructionCost because it cannot be expanded for SVE? If so, can you add a TODO?

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1135
+                         CmpInst::BAD_ICMP_PREDICATE, CostKind);
+  return LT.first * NumReduxLevels * MinMaxCost;
+}
----------------
david-arm wrote:
> sdesmalen wrote:
> > I think this should be:
> > ```Log2(LT.first) * MinMaxVectorCost + NumReduxLevels * MinMaxScalarCost```
> > 
> > Because the number of legalized vectors will do min/max operations amongst the vectors (to end up with a single vector), before it will do the min/max reduction on the elements within the vector.
> > That also means distinguishing the cost for CmpSel on vectors and CmpSel on scalars.
> Similar to my comment above about max vscale, I think we can do this in a much simpler way. Rather than try to calculate a complex cost in this way using selects and compares, etc., I think we can just do it the same way that X86 does. We can have a simple cost lookup table for legal reductions (SMIN/MAX,UMIN/MAX, etc.), then add on a legalisation factor.
Sure, we can do that. I was more thinking that we'd then lose scaling of the cost if we know something about vscale. Reductions requires successive inter-lane operations (as opposed to e.g. fadd, which can be done in parallel), so the number of lanes probably has some impact on the cost. If we make it dependent on MaxVscale, the cost-model could return a lower cost if MaxVL=128bits than if MaxVL is 1024bits (so we can tweak the costs with -msve-vector-bits=<bits>). X86 doesn't have this problem, because the vector size is always fixed, so it can use a look-up-table.
I'm not saying that I'm completely against a simple fixed cost initially, just sharing my thoughts.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93639/new/

https://reviews.llvm.org/D93639