[PATCH] D143938: [VPlan] Compute costs for plans directly after construction.

Sun Apr 30 13:20:16 PDT 2023

fhahn added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h:281

-  SmallVector<VPlanPtr, 4> VPlans;
+  SmallVector<VPlanPtr> VPlans;
+
----------------
Ayal wrote:
> nit: retain initial size of 4?
Retained, thanks!

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h:283
+      std::pair<VPlanPtr, SmallVector<std::pair<VectorizationFactor, bool>>>, 4>
+      VPlans;

----------------
Ayal wrote:
> fhahn wrote:
> > fhahn wrote:
> > > Ayal wrote:
> > > > Perhaps the following would be better, at-least when first moving VectorizationFactors from CM to LVP: keep `VPlans` as a vector of all VPlanPtrs, as its name suggests; hold all VectorizationFactors in a separate vector or set indexed by their ElementCount Width (as in LVP::plan()'s CandidateVFs)? Can also hold a map associating each VectorizationFactor with its VPlan, if needed; is the reverse mapping useful?
> > > > 
> > > > (Independent of this patch): Hold the boolean IsVec as a field inside VectorizationFactor, rename VectorizationFactor more accurately.
> > > Thanks, I updated the code to use a separate mapping from VPlan->vector of pair<VectorizationFactor, bool>. This ensures that the costs are associated with a specific VPlan. WDYT?
> > > 
> > > As for the IsVec field, I am going to investigate if this is actually needed or if we can get the information from the existing VPlan.
> > > As for the IsVec field, I am going to investigate if this is actually needed or if we can get the information from the existing VPlan.
> > 
> > We don't need it at this stage any longer, we can instead just avoid adding such VFs to CandidateVFs. Removed the boolean flag from LVP.
> >> hold all VectorizationFactors in a separate vector or set indexed by their ElementCount Width (as in LVP::plan()'s CandidateVFs)? Can also hold a map associating each VectorizationFactor with its VPlan, if needed; is the reverse mapping useful?
> 
> > Thanks, I updated the code to use a separate mapping from VPlan->vector of pair<VectorizationFactor, bool>. This ensures that the costs are associated with a specific VPlan. WDYT?
> 
> A single cost is currently computed and associated with a specific VF as indicated by `expectedCost()`, `select[Epilogue]VectorizationFactor()` and LVP::plan()'s `CandidateVF`s; a single Plan is expected for each VF including 1 and BestVF. So LVP can hold all VectorizationFactors in a vector for now, or in a set indexed by Width, regardless of their Plans.
> Can revisit this if/when the Plan associated with each VectorizationFactor and/or the set of VectorizationFactors corresponding to a Plan's `VFs` are needed. Sounds reasonable?
Follow-up patches will break the 1-1 mapping from VFs->Cost e.g. when planning with multiple cost models there may be different plans for the same VFs but with different costs. For that, the VPlan->costs mapping seems to be required; happy to have VF->Cost mapping as intermediate step, but it might lead to less churn to use the Plan->costs mapping in this patch already?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h:328
   /// Look through the existing plans and return true if we have one with all
   /// the vectorization factors in question.
   bool hasPlanWithVF(ElementCount VF) const {
----------------
Ayal wrote:
> nit (independent of this patch): "all the vectorization factors in question"?
Updated in 9fce1fc6f8344871b86.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h:347-350
+  /// \return The most profitable vectorization factor and the cost of that VF.
+  /// This method checks every VF in \p CandidateVFs. If UserVF is not ZERO
+  /// then this vectorization factor will be selected if vectorization is
+  /// possible.
----------------
fhahn wrote:
> Ayal wrote:
> > nit: comment deserves update - CandidateVFs is no longer a parameter. (Independent of this patch:) UserVF is irrelevant, behavior of ForceVectorization should be explained.
> Thanks, updated the comment!
Updated in 0b24436591e1bbea5943d7ef816544f2d90d26d2

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h:352
+  VectorizationFactor selectVectorizationFactor();
+
+  VectorizationFactor
----------------
Ayal wrote:
> /// ... (independent of this patch)
added comment in 6fa07a87abc96c3b924be07ce446e79691a7f186 and moved it here in the patch.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h:390
                                   ElementCount MinVF);
+
+  std::optional<unsigned> getVScaleForTuning() const;
----------------
Ayal wrote:
> /// ... (independent of this patch)
add comment, thanks! I also removed the definition in the cost model

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h:404
+  /// false otherwise.
+  /// \p VF is the vectorization factor chosen for the original loop.
+  bool isEpilogueVectorizationProfitable(const ElementCount VF) const;
----------------
Ayal wrote:
> nit (independent of this patch): above needs L but below needs not?
Removed loop argument in a431402fd262ca54a923842148be6cdcef7ab636

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5475
+  ElementCount ScalarFactor = ElementCount::getFixed(1);
+  InstructionCost ExpectedCost = getCandidateVFsFor(*VPlans[0])[0].ScalarCost;
+  const VectorizationFactor ScalarCost(ScalarFactor, ExpectedCost,
----------------
Ayal wrote:
> Worth a comment: all VectorizationFactors cache the same ScalarCost, so pick the first one? It need not represent the scalar VF=1.
> Can retain the current call to `expectedCost(ScalarFactor)` and replace it independent of this patch.
Thanks, for now the version retained expectedCost(ScalarFactor), to be replaced in a follow-on patch.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5479-5480
+  VectorizationFactor ChosenFactor = ScalarCost;
+  assert(hasPlanWithVF(ElementCount::getFixed(1)) &&
+         "Expected Scalar VF to be a candidate");

----------------
Ayal wrote:
> 
Updated, thanks!

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5483
+  bool ForceVectorization = Hints.getForce() == LoopVectorizeHints::FK_Enabled;
+  if (ForceVectorization) {
+    // Ignore scalar width, because the user explicitly wants vectorization.
----------------
Ayal wrote:
> Retain condition that there is more than a (i.e., the scalar) single VFCandidate?
> Always initialize ChosenFactor to getMax() and then skip considering scalar VF only if ForceVectorization? (Or `else` compute ScalarCost and set ChosenFactor to it.)
Retained the condition for now

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5491
+  for (auto &Plan : VPlans) {
+    for (const auto &Candidate : getCandidateVFsFor(*Plan)) {
+      if (Candidate.Width.isScalar())
----------------
Ayal wrote:
> Simply traverse all VFCandidates in a single loop as before?
There's now no single vector condition all candidate VFs and costs. We either need to iterate over the entries in the map or over the plans, the latter seeming slightly preferable, as it will make it easy to return the associated plan as well (instead of later looking it up)

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5492
+    for (const auto &Candidate : getCandidateVFsFor(*Plan)) {
+      if (Candidate.Width.isScalar())
+        continue;
----------------
Ayal wrote:
> nit: retrain comment that scalar cost was already computed?
retained, thanks!

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5621
+  for (auto &VPlan : VPlans) {
+    for (const auto &NextVF : getCandidateVFsFor(*VPlan)) {
+      assert(VPlan->hasVF(NextVF.Width) && "VF not in plan");
----------------
Ayal wrote:
> Simpler to retain single loop over all VectorizationFactors? VPlan used only in assert.
Same as above, there's now no single vector condition all candidate VFs and costs. 

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7561
   // Populate the set of Vectorization Factor Candidates.
   ElementCountSet VFCandidates;
   for (auto VF = ElementCount::getFixed(1);
----------------
Ayal wrote:
> Worth continuing to pass `VFCandidates` to selectVectorizationFactor()? The former is still initialized to span range of all possible VF's, before trying to build VPlans, which will possibly exclude VF's. Passing VFCandidates to buildVPlansWithVPRecipes() would allow the latter to filter out non candidates (Fixed/Scalable), and then have selectVectorizationFactor() compute costs and select VF from within the remaining candidates? Can also introduce a separate LVP::computeCosts(VFCandidates).
I am not sure; the main motivation of the patch is to compute the cost on construction and delaying cost computation until selectVectorizationFactor would still mostly retain the original behavior? AFAICT the original selectVectorizationFactor doesn't filter out candidates before computing the cost; Scalable VFs are only added to VFCandidates below depending on MaxFactors.ScalableVF. So if the target doesn't support scalable VFs, no VPlans with scalable VFs should be generated.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7604

-  for (const VPlanPtr &Plan : VPlans) {
+  for (const auto &Plan : VPlans) {
     if (Plan->hasVF(VF))
----------------
Ayal wrote:
> nit: change to `auto` independent of this patch.
Changed it back to `VPlanPtr` to retain type info.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8060
     VFRange SubRange = {VF, MaxVFTimes2};
-    VPlans.push_back(buildVPlan(SubRange));
+    VPlans.emplace_back(buildVPlan(SubRange));
+    CandidateVFs[&*VPlans.back()] = SmallVector<VectorizationFactor>();
----------------
Ayal wrote:
> nit: change from push_back to emplace_back independent of this patch?
Changed back to `push_back`, `emplace_back` as only beneficial in earlier versions that created a pair here.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8061
+    VPlans.emplace_back(buildVPlan(SubRange));
+    CandidateVFs[&*VPlans.back()] = SmallVector<VectorizationFactor>();
     VF = SubRange.End;
----------------
Ayal wrote:
> Where/Are CandidateVFs filled when building w/o recipes?
Do you mean for the native path? In that case selectVectorizationFactor isn't used yet, so CandidateVFs isn't used.

Once the change is in we could build on top and try to converge the native and ILV paths at this point and create plans for inner and outer loops and associate costs with the outer loop versions as well.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8688
+    if (!Plan) {
+      VF = SubRange.End;
+      continue;
----------------
Ayal wrote:
> Could SubRange be excluded from VFCandidates here, for the latter to be traversed subsequently to compute costs independently, rather than fusing it with the construction of each Plan and its SubRange?
Updated to retain the original loop and then compute the costs by iterating over all plans and computing costs for all VFs associated with them

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8700
+            << " because it will not generate any vector instructions.\n");
+        continue;
+      }
----------------
Ayal wrote:
> Place this dbgs dump and continue after the #ifndef NDEBUG dump below, as done now?
moved, thanks!

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:2323

+  const SmallSetVector<ElementCount, 2> getVFs() const { return VFs; }
+
----------------
Ayal wrote:
> This is redundant?
The latest version uses this to iterate over the VFs of a plan. I changed it to return ArrayRef<> though.

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/no_vector_instructions.ll:10
 ; CHECK:       LV: Found an estimated cost of 1 for VF 2 For instruction: %i.next = add nuw nsw i64 %i, 2
-; CHECK:       LV: Not considering vector loop of width 2 because it will not generate any vector instructions
+; CHECK:       LV: Selecting VF: 1
 ;
----------------
Ayal wrote:
> Could the error message "Not considering vector loop of width 2 because it will not generate any vector instructions" be retained, even when filtering these options?
Yes, this works as expected with the latest version, I undid the change.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143938/new/

https://reviews.llvm.org/D143938