[PATCH] D143938: [VPlan] Compute costs for plans directly after construction.

Sun Apr 30 08:15:46 PDT 2023

Ayal added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h:281

-  SmallVector<VPlanPtr, 4> VPlans;
+  SmallVector<VPlanPtr> VPlans;
+
----------------
nit: retain initial size of 4?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h:283
+      std::pair<VPlanPtr, SmallVector<std::pair<VectorizationFactor, bool>>>, 4>
+      VPlans;

----------------
fhahn wrote:
> fhahn wrote:
> > Ayal wrote:
> > > Perhaps the following would be better, at-least when first moving VectorizationFactors from CM to LVP: keep `VPlans` as a vector of all VPlanPtrs, as its name suggests; hold all VectorizationFactors in a separate vector or set indexed by their ElementCount Width (as in LVP::plan()'s CandidateVFs)? Can also hold a map associating each VectorizationFactor with its VPlan, if needed; is the reverse mapping useful?
> > > 
> > > (Independent of this patch): Hold the boolean IsVec as a field inside VectorizationFactor, rename VectorizationFactor more accurately.
> > Thanks, I updated the code to use a separate mapping from VPlan->vector of pair<VectorizationFactor, bool>. This ensures that the costs are associated with a specific VPlan. WDYT?
> > 
> > As for the IsVec field, I am going to investigate if this is actually needed or if we can get the information from the existing VPlan.
> > As for the IsVec field, I am going to investigate if this is actually needed or if we can get the information from the existing VPlan.
> 
> We don't need it at this stage any longer, we can instead just avoid adding such VFs to CandidateVFs. Removed the boolean flag from LVP.
>> hold all VectorizationFactors in a separate vector or set indexed by their ElementCount Width (as in LVP::plan()'s CandidateVFs)? Can also hold a map associating each VectorizationFactor with its VPlan, if needed; is the reverse mapping useful?

> Thanks, I updated the code to use a separate mapping from VPlan->vector of pair<VectorizationFactor, bool>. This ensures that the costs are associated with a specific VPlan. WDYT?

A single cost is currently computed and associated with a specific VF as indicated by `expectedCost()`, `select[Epilogue]VectorizationFactor()` and LVP::plan()'s `CandidateVF`s; a single Plan is expected for each VF including 1 and BestVF. So LVP can hold all VectorizationFactors in a vector for now, or in a set indexed by Width, regardless of their Plans.
Can revisit this if/when the Plan associated with each VectorizationFactor and/or the set of VectorizationFactors corresponding to a Plan's `VFs` are needed. Sounds reasonable?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5475
+  ElementCount ScalarFactor = ElementCount::getFixed(1);
+  InstructionCost ExpectedCost = getCandidateVFsFor(*VPlans[0])[0].ScalarCost;
+  const VectorizationFactor ScalarCost(ScalarFactor, ExpectedCost,
----------------
Worth a comment: all VectorizationFactors cache the same ScalarCost, so pick the first one? It need not represent the scalar VF=1.
Can retain the current call to `expectedCost(ScalarFactor)` and replace it independent of this patch.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5479-5480
+  VectorizationFactor ChosenFactor = ScalarCost;
+  assert(hasPlanWithVF(ElementCount::getFixed(1)) &&
+         "Expected Scalar VF to be a candidate");

----------------

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5483
+  bool ForceVectorization = Hints.getForce() == LoopVectorizeHints::FK_Enabled;
+  if (ForceVectorization) {
+    // Ignore scalar width, because the user explicitly wants vectorization.
----------------
Retain condition that there is more than a (i.e., the scalar) single VFCandidate?
Always initialize ChosenFactor to getMax() and then skip considering scalar VF only if ForceVectorization? (Or `else` compute ScalarCost and set ChosenFactor to it.)

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5491
+  for (auto &Plan : VPlans) {
+    for (const auto &Candidate : getCandidateVFsFor(*Plan)) {
+      if (Candidate.Width.isScalar())
----------------
Simply traverse all VFCandidates in a single loop as before?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5492
+    for (const auto &Candidate : getCandidateVFsFor(*Plan)) {
+      if (Candidate.Width.isScalar())
+        continue;
----------------
nit: retrain comment that scalar cost was already computed?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5621
+  for (auto &VPlan : VPlans) {
+    for (const auto &NextVF : getCandidateVFsFor(*VPlan)) {
+      assert(VPlan->hasVF(NextVF.Width) && "VF not in plan");
----------------
Simpler to retain single loop over all VectorizationFactors? VPlan used only in assert.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7561
   // Populate the set of Vectorization Factor Candidates.
   ElementCountSet VFCandidates;
   for (auto VF = ElementCount::getFixed(1);
----------------
Worth continuing to pass `VFCandidates` to selectVectorizationFactor()? The former is still initialized to span range of all possible VF's, before trying to build VPlans, which will possibly exclude VF's. Passing VFCandidates to buildVPlansWithVPRecipes() would allow the latter to filter out non candidates (Fixed/Scalable), and then have selectVectorizationFactor() compute costs and select VF from within the remaining candidates? Can also introduce a separate LVP::computeCosts(VFCandidates).

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7604

-  for (const VPlanPtr &Plan : VPlans) {
+  for (const auto &Plan : VPlans) {
     if (Plan->hasVF(VF))
----------------
nit: change to `auto` independent of this patch.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8060
     VFRange SubRange = {VF, MaxVFTimes2};
-    VPlans.push_back(buildVPlan(SubRange));
+    VPlans.emplace_back(buildVPlan(SubRange));
+    CandidateVFs[&*VPlans.back()] = SmallVector<VectorizationFactor>();
----------------
nit: change from push_back to emplace_back independent of this patch?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8061
+    VPlans.emplace_back(buildVPlan(SubRange));
+    CandidateVFs[&*VPlans.back()] = SmallVector<VectorizationFactor>();
     VF = SubRange.End;
----------------
Where/Are CandidateVFs filled when building w/o recipes?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8688
+    if (!Plan) {
+      VF = SubRange.End;
+      continue;
----------------
Could SubRange be excluded from VFCandidates here, for the latter to be traversed subsequently to compute costs independently, rather than fusing it with the construction of each Plan and its SubRange?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8700
+            << " because it will not generate any vector instructions.\n");
+        continue;
+      }
----------------
Place this dbgs dump and continue after the #ifndef NDEBUG dump below, as done now?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5513
+      return !isMoreProfitable(Candidate.first, ChosenFactor);
+    });

----------------
fhahn wrote:
> Ayal wrote:
> > This is independent of Plan, right?
> Yes. I think actually filtering out the unprofitable VFs is not really needed, as the loop below should handle all cases already. Removed the code. Could arguably also be split off.
Very well. Comment was to clarify that the association with Plans is not needed and if so to remove the named variable `Plan`.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:2323

+  const SmallSetVector<ElementCount, 2> getVFs() const { return VFs; }
+
----------------
This is redundant?

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/no_vector_instructions.ll:10
 ; CHECK:       LV: Found an estimated cost of 1 for VF 2 For instruction: %i.next = add nuw nsw i64 %i, 2
-; CHECK:       LV: Not considering vector loop of width 2 because it will not generate any vector instructions
+; CHECK:       LV: Selecting VF: 1
 ;
----------------
Could the error message "Not considering vector loop of width 2 because it will not generate any vector instructions" be retained, even when filtering these options?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143938/new/

https://reviews.llvm.org/D143938