[PATCH] D67905: [LV] Vectorizer should adjust trip count in profile information

Mon Dec 9 01:55:49 PST 2019

ebrevnov marked 9 inline comments as done.
ebrevnov added a comment.

In D67905#1770563 <https://reviews.llvm.org/D67905#1770563>, @Ayal wrote:

> Still think it would be better to provide this as a standalone function in Transforms/Utils/LoopUtils, for potential benefit of loop unroll (and jam) passes in addition to LV. Having agreed to ignore foldTail and requiresScalarEpilog, there's nothing vectorization-specific to do here. There's still an issue though with the fact that LV may use the scalar loop for both the remaining TC%(VF*UF) iterations when running the vector loop, and for all TC iterations when runtime guards bypass the vector loop. In absence of information, each such guard could be assigned 0.5 probability, or one could be aggressively optimistic and hope vector loop is always reached. In any case this deserves a comment.
>
> Suggesting further variable name changes for the three Orig, Unrolled, and Remainder Loops, each having a LoopEntry==LoopExit edge weight, a Backedge weight, a HeaderBlock weight, and an AverageTripCount. The actual weights are recorded as TrueVal and FalseVal of the latch branches.
>
> Patch needs to be clang-format'ed.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3991
+  if (!OrigBackBranchI->extractProfMetadata(OrigTakenWeight,
+                                           OrigExitWeight))
+    return;
----------------
Ayal wrote:
> It seems clearer to call extractProfMetadata(TrueVal, FalseVal) and then set BackedgeTaken and LoopExit/Entry weights according to if (OrigLoopLatchBranch->getSuccessor(0) == OrigLoop->getHeader()), following LoopUtil's getLoopEstimatedTripCount(). Analogously for createBranchWeights(TrueVal, FalseVal).
> 
> In any case, better rename "IsTrueBackEdge*Loop".
Don't feel convinced. My point would be that extra variables and conditional reassignments make the code less readable. I think this is very subjective thing.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4025
+  if (VecAverageTripCount > 0)  {
+    VecTakenCount = (VecAverageTripCount - 1) * OrigExitWeight;
+    VecFallThrough = OrigExitWeight;
----------------
Ayal wrote:
> How about
> 
> ```
> if (UnrolledLoopAverageTripCount > 0) {
>   UnrolledLoopEntryWeight = OrigLoopEntryWeight;
>   uint64_t UnrolledLoopHeaderWeight = UnrolledLoopAverageTripCount * UnrolledLoopEntryWeight; // Analogous to computing OrigLoopAverageTripCount from Header and Entry weights above.
>   UnrolledLoopBackedgeWeight = UnrolledLoopHeaderWeight - UnrolledLoopEntryWeight;
> }
> ```
> leaving the -1 optimization to the compiler.
That will make computations less stable to overflow. Personally I feel the way it's written today has the same level of complexity for understanding. 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67905/new/

https://reviews.llvm.org/D67905