[PATCH] D67905: [LV] Vectorizer should adjust trip count in profile information

Ayal Zaks via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Dec 5 05:00:58 PST 2019


Ayal added a comment.

Still think it would be better to provide this as a standalone function in Transforms/Utils/LoopUtils, for potential benefit of loop unroll (and jam) passes in addition to LV. Having agreed to ignore foldTail and requiresScalarEpilog, there's nothing vectorization-specific to do here. There's still an issue though with the fact that LV may use the scalar loop for both the remaining TC%(VF*UF) iterations when running the vector loop, and for all TC iterations when runtime guards bypass the vector loop. In absence of information, each such guard could be assigned 0.5 probability, or one could be aggressively optimistic and hope vector loop is always reached. In any case this deserves a comment.

Suggesting further variable name changes for the three Orig, Unrolled, and Remainder Loops, each having a LoopEntry==LoopExit edge weight, a Backedge weight, a HeaderBlock weight, and an AverageTripCount. The actual weights are recorded as TrueVal and FalseVal of the latch branches.

Patch needs to be clang-format'ed.



================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3978
 
+// Update profile info since expected TC of vectorized loop is less by VFxUF
+// than original TC. At the same time original loop becomes prolog/epilog loop
----------------
"is less ... than original TC" >> "is smaller than the original TC by a factor of VFxUF"


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3985
+void InnerLoopVectorizer::fixProfileInfo() {
+  uint64_t OrigTakenWeight = 0;
+  uint64_t OrigExitWeight = 0;
----------------
OrigTakenWeight >> OrigBackedgeTakenWeight or OrigBackedgeWeight ?
OrigExitWeight >> OrigLoopExitWeight?
May help to also set OrigLoopEntryWeight = OrigLoopExitWeight?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3988
+  Loop *OrigLoop = LI->getLoopFor(LoopScalarBody);
+  auto *OrigBackBranchI = OrigLoop->getLoopLatch()->getTerminator();
+
----------------
OrigBackBranchI >> OrigLoopLatchBranch ?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3991
+  if (!OrigBackBranchI->extractProfMetadata(OrigTakenWeight,
+                                           OrigExitWeight))
+    return;
----------------
It seems clearer to call extractProfMetadata(TrueVal, FalseVal) and then set BackedgeTaken and LoopExit/Entry weights according to if (OrigLoopLatchBranch->getSuccessor(0) == OrigLoop->getHeader()), following LoopUtil's getLoopEstimatedTripCount(). Analogously for createBranchWeights(TrueVal, FalseVal).

In any case, better rename "IsTrueBackEdge*Loop".


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4005
+
+  if  (OrigExitWeight == 0)
+    return;
----------------
Patch needs to be clang-format'ed


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4012
+  // (OrigTakenWeight + OrigExitWeight) / OrigExitWeight ==
+  // (OrigTakenWeight / OrigExitWeight + 1)
+  // Note: Uses of OrigIterCount below should not be simplified as it will
----------------
Instead of providing the explanation in a comment, seems better to implement the code this way, leaving the +1 for the compiler to optimize. I.e.,

```
const uint64_t OrigHeaderBlockWeight = OrigBackedgeTakenWeight + OrigLoopEntryWeight;
const unit64_t OrigAverageTripCount = OrigHeaderBlockWeight / OrigLoopEntryWeight;
```


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4019
+  // Calculate number of iterations for prolog/epilog loop.
+  uint64_t PEIterCount = OrigAverageTripCount % (VF * UF);
+
----------------
Better rename/expand "PE".
PEIterCount >> RemainderLoopAverageTripCount?



================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4023
+  uint64_t VecTakenCount = 0;
+  uint64_t VecFallThrough = 0;
+  if (VecAverageTripCount > 0)  {
----------------
In the general context, "Vec" >> "UnrolledLoop".
VecTakenCount >> UnrolledLoopBackedgeWeight
VecFallThrough >> UnrolledLoopExitWeight and/or UnrolledLoopEntryWeight


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4025
+  if (VecAverageTripCount > 0)  {
+    VecTakenCount = (VecAverageTripCount - 1) * OrigExitWeight;
+    VecFallThrough = OrigExitWeight;
----------------
How about

```
if (UnrolledLoopAverageTripCount > 0) {
  UnrolledLoopEntryWeight = OrigLoopEntryWeight;
  uint64_t UnrolledLoopHeaderWeight = UnrolledLoopAverageTripCount * UnrolledLoopEntryWeight; // Analogous to computing OrigLoopAverageTripCount from Header and Entry weights above.
  UnrolledLoopBackedgeWeight = UnrolledLoopHeaderWeight - UnrolledLoopEntryWeight;
}
```
leaving the -1 optimization to the compiler.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4031
+  uint64_t PETakenCount = 0;
+  uint64_t PEFallThroughCount = 0;
+  if (PEIterCount > 0) {
----------------
PETakenCount >> RemainderLoopBackedgeWeight
PEFallThroughCount >> RemainderLoopExitWeight and/or RemainderLoopEntryWeight


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67905/new/

https://reviews.llvm.org/D67905





More information about the llvm-commits mailing list