[PATCH] D126680: [VPlan] Replace BranchOnCount with BranchOnCond if TC <= UF * VF.

Mon Jun 6 01:34:03 PDT 2022

fhahn added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:890
+  auto *Term = dyn_cast<VPInstruction>(&ExitingVPBB->back());
+  // Try to simplify BranchOnCount to 'BranchOnCond true' if TC <= VF * UF.
+  if (Term && Term->getOpcode() == VPInstruction::BranchOnCount) {
----------------
Ayal wrote:
> CanonicalIVStartValue is assumed to be non negative, right?
> 
> This is as close to a VPlan2VPlan transformation as we can get at the moment in general. The TC <= VF case (using lower-bound of 1 for UF) could be optimized as an early VPlan2VPlan transformation, considering VPlan's VF range. But UF is currently set after finalizing a VPlan, except if set manually and/or optimizing for size.
> CanonicalIVStartValue is assumed to be non negative, right?

Ah yes, it is always non-negative. When preparing to execute the plan for the main vector loop it will be set to nullptr, for the epilogue vector loop it will be set to a positive start value. 

But I think it only makes sense for now to simplify for the main vector loop, so I'll added a  check.

> This is as close to a VPlan2VPlan transformation as we can get at the moment in general. The TC <= VF case (using lower-bound of 1 for UF) could be optimized as an early VPlan2VPlan transformation, considering VPlan's VF range. But UF is currently set after finalizing a VPlan, except if set manually and/or optimizing for size.

Yes, I think it would be good to move this to a VPlan2VPlan transform as follow-up, but we need to think a bit more on how to model constraints on the UF I think.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:892
+  if (Term && Term->getOpcode() == VPInstruction::BranchOnCount) {
+    if (auto *C = dyn_cast<ConstantInt>(TripCountV)) {
+      uint64_t TCVal = C->getZExtValue();
----------------
Ayal wrote:
> nit: can fold condition e.g. into `&& isa<ConstantInt>`...
Reduced the nesting level, thanks!

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:894
+      uint64_t TCVal = C->getZExtValue();
+      if (TCVal && TCVal <= State.VF.getKnownMinValue() * State.UF) {
+        auto *BOC =
----------------
Ayal wrote:
> getFixedValue()?
Unfortunately I think `getKnownMinValue` is needed to handle scalable VFs here. there are some test cases that would crash with `getFixedValue`.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:897
+            new VPInstruction(VPInstruction::BranchOnCond,
+                              {getOrAddExternalDef(State.Builder.getTrue())});
+        Term->eraseFromParent();
----------------
Ayal wrote:
> (Could alternatively set the two operands of Term to the same value for trivial comparison, given that BranchOnCount actually means ReiterateWhileNotEqual. Currently BranchOnCond can also represent loop branches, and using it is clearer. May be good to have a single representation for loop branches, independent of this patch.)
> May be good to have a single representation for loop branches, independent of this patch.

Sounds good as follow-up!

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D126680/new/

https://reviews.llvm.org/D126680