[PATCH] D109432: [LoopVectorize] Permit fixed-width epilogue loops for scalable vector bodies

Fri Oct 1 12:39:20 PDT 2021

sdesmalen added a comment.

Hi @david-arm, thanks for working on this. I don't see any fundamental issues with having a different VPlan for the epilogue loop and the main vector body. In fact, it makes sense to me to pick the most suitable and cost-effective (VF, Plan) pair for the epilogue. If we want to use scalable vectors for the epilogue loop at some point, then we'll also want to use predication, so I could see that requiring a different VPlan. It doesn't seem to me that this patch makes arbitrary decisions about which VPlan to choose, it aims to pick the VF with the lowest cost, although it currently falls back (consistently) to a fixed-width VPlan when the main loop has a scalable VF.

Thanks for testing that there are no functional regressions when having different VPlans for the main- and epilogue vector loops.

I made some suggestions to simplify the implementation, hope my comments make sense.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:6244
+    ElementCount ForcedEC =
+        ElementCount::getFixed(EpilogueVectorizationForceVF);
+    if (LVP.hasPlanWithVFs({ForcedEC}))
----------------
Can this change be committed as a separate patch, this seems like a sensible bugfix to me. (although no idea how to test for it)

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:6264
+  ElementCount FixedMainLoopVF = MainLoopVF;
+  if (FixedMainLoopVF.isScalable()) {
+    FixedMainLoopVF =
----------------
nit: Can you rewrite this as:

  if (MainLoopVF.isScalable())
    LLVM_DEBUG(...);

  auto FixedMainLoopVF = ElementCount::getFixed(FixedMainLoopVF.getKnownMinValue());

`FixedMainLoopVF.isScalable()` reads like a contradictory expression that should never happen.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8200
+      if (!I.get()->hasVF(VF))
+        BackupPlans->push_back(std::unique_ptr<llvm::VPlan>(I.release()));
+    }
----------------
I see that you're just building on top of the current mechanism, but I'd rather see `setBestPlan` replaced by `getBestPlanFor`, which returns a pointer to the VPlan that can be passed to `LoopVectorizationPlanner::executePlan`. That way, you don't need to do all this odd trickery with removing vplans and requiring `BackupPlans` to repopulate the set for a second call to `setBestPlan`.

Then the code becomes a bit simpler to follow:

  auto Plan = getBestPlanFor(VF, UF)
  LVP.executePlan(Plan, VF, UF, ILV, DT);

I don't know whether the type of `auto Plan` can be `VPlan*` or whether it needs to be an instance of `std::unique_ptr<VPlan>`. It would be convenient if the LoopVectorizationPlanner can keep ownership of all VPlans until the end, where `LoopVectorizationPlanner::executePlan` just invokes the relevant VPlan to execute.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8407
+  IRBuilder<> B(&*Lp->getLoopPreheader()->getFirstInsertionPt());
+  Value *Step = getRuntimeVF(B, IdxTy, VF * UF);
   Value *CountRoundDown = getOrCreateVectorTripCount(Lp);
----------------
Not sure I fully understand it, but is the VF at this point always guaranteed to be a fixed-width VF? If so, can we avoid making this change here (and instead s/getKnownMinValue/getFixedValue/)? I'm sure we'll want this change at some point when we make the epilogue VF scalable, but perhaps this patch is not the one to change it in.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109432/new/

https://reviews.llvm.org/D109432