[PATCH] D121899: [LoopVectorize] Optimise away the icmp when tail-folding for some low trip counts

Sander de Smalen via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Apr 27 06:57:36 PDT 2022


sdesmalen added a comment.

Thanks for this patch Dave, I've left a few comments.

Perhaps it's worth highlighting in the commit message that this is more of a problem for scalable vectors, where the branch/compare isn't constant folded or instcombined away that easily as it is for fixed-width vectors.



================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8809
 
   auto *BranchOnCount =
       new VPInstruction(VPInstruction::BranchOnCount,
----------------
Initially I wanted to suggest creating a new (unconditional) branch instruction here, but at this point you don't know which VF will be chosen for the given VPlan, so we have to defer this decision until codegen.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8812
+                        {CanonicalIVIncrement, &Plan.getVectorTripCount(),
+                         Plan.getOrCreateTripCount()},
+                        DL);
----------------
There's nothing wrong with it, but perhaps it's a bit unfortunate this has to be passed into the VPInstruction as an operand. It seems more like a bit of useful knowledge of the loop that the VPInstruction could use to optimise the branch, rather than an actual operand for the conceptual BranchOnCount intrinsic. If the trip-count of the loop is constant, maybe we can store that information as a piece of state somewhere.

@fhahn what would be the desired place to add such information about the loop? My understanding was that the uses of the InnerLoopVectorizer in VPlan were being phased out, is that right?


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:750-760
+    uint64_t TCVal = 0;
+    if (auto *TC2 = dyn_cast<ConstantInt>(TC))
+      TCVal = TC2->getZExtValue();
+
+    Value *Cond;
+    // When we know there will only be one vector iteration there is no need to
+    // create the comparison, since we already know the answer.
----------------
nit: How about

  Value *ConstCmp = nullptr;
  if (auto *C = dyn_cast<ConstantInt>(TC))
    if (C->getZExtValue() <= State.UF * State.VF.getKnownMinValue())
      ConstCmp = Builder.getInt1(true);
  Value *Cond = ConstCmp ? ConstCmp : Builder.CreateICmpEQ(IV, VTC);


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:978
   // modeled explicitly in VPlan.
-  cast<Instruction>(LastBranch->getCondition())->moveBefore(LastBranch);
+  if (isa<Instruction>(LastBranch->getCondition()))
+    cast<Instruction>(LastBranch->getCondition())->moveBefore(LastBranch);
----------------
I noticed some of this code here has been removed and I'm not sure if this change is then still required. In any case this patch needs a rebase.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121899/new/

https://reviews.llvm.org/D121899



More information about the llvm-commits mailing list