[PATCH] D67905: [LV] Vectorizer should adjust trip count in profile information

Ayal Zaks via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Dec 18 14:00:39 PST 2019


Ayal added inline comments.


================
Comment at: llvm/include/llvm/Transforms/Utils/LoopUtils.h:361
+/// Update profile info for the \p OrigLoop and \p UnrolledLoop so that original
+/// number of iterations in the \p OrigLoop (TC) are distributed as follows.
+/// \p OrigLoop gets TC%UF iterations, while rest iterations are executed as
----------------
The fact that OrigLoop is both the original loop containing the original profile weights, and acts as the RemainderLoop dedicated to leftover iterations, should be clarified. Alternatively, this utility can receive three loops: OrigLoop, UnrolledLoop and RemainderLoop, leaving it to the caller to decide if to pass OrigLoop also as RemainderLoop.

Would probably be clearer to start with UnrolledLoop receiving weights that reflect TC/UF iterations, and then OrigLoop which receives weights that reflect the remaining TC%UF iterations.


================
Comment at: llvm/include/llvm/Transforms/Utils/LoopUtils.h:363
+/// \p OrigLoop gets TC%UF iterations, while rest iterations are executed as
+/// part of \p UrolledLoop. In addition, \p UrolledLoop executes blocks of \UF
+/// original iterations thus will do TC/UF iterations in total.
----------------
U[n]rolledLoop, several occurrences.

"\UF" >> "\p UF"


================
Comment at: llvm/lib/Transforms/Utils/LoopUtils.cpp:1040
+  uint64_t OrigBackedgeTakenWeight = 0;
+  uint64_t OrigLoopEntryWeight = 0;
+  auto *OrigLoopLatchBranch = OrigLoop->getLoopLatch()->getTerminator();
----------------
Worth commenting that OrigLoopEntryWeight also holds OrigLoopExitWeight, which is more clearly the weight associated with the (exit direction of the) latch branch.


================
Comment at: llvm/lib/Transforms/Utils/LoopUtils.cpp:1048
+  MDBuilder MDB(OrigLoopLatchBranch->getContext());
+  auto *UnrolledBBI = UnrolledLoop->getLoopLatch()->getTerminator();
+  bool IsTrueBackEdgeOrigLoop =
----------------
UnrolledBBI >> Unrolled[Loop]LatchBranch, as in OrigLoopLatchBranch.

As the names end up overflowing lines, can use Orig, Unrolled and Remainder to stand for OrigLoop, UnrolledLoop and RemainderLoop; i.e., taking "Loop" out.


================
Comment at: llvm/lib/Transforms/Utils/LoopUtils.cpp:1051
+      OrigLoop->contains(*succ_begin(OrigLoop->getLoopLatch()));
+  bool IsTrueBackEdgeVecLoop =
+      UnrolledLoop->contains(*succ_begin(UnrolledLoop->getLoopLatch()));
----------------
VecLoop >> UnrolledLoop


================
Comment at: llvm/lib/Transforms/Utils/LoopUtils.cpp:1061
+  // Calculate number of iterations in the original scalar loop.
+  const uint64_t OrigHeaderBlockWeight =
+      OrigBackedgeTakenWeight + OrigLoopEntryWeight;
----------------
Can drop the 'const', for consistency; these temporaries are obviously const's.


================
Comment at: llvm/lib/Transforms/Utils/LoopUtils.cpp:1064
+  const uint64_t OrigAverageTripCount =
+      OrigHeaderBlockWeight / OrigLoopEntryWeight;
+  // Calculate number of iterations in unrolled loop.
----------------
Note that this is rounding down. Can add half of the denominator to the nominator before dividing in order to round more accurately; this is what getLoopEstimatedTripCount() does, but it seems to be off by 1 as it computes BackEdgeTakenWeight / LoopEntryWeight rounded to nearest, instead of HeaderBlockWeight / LoopEntryWeight rounded to nearest...

Simply call OrigAverageTripCount = `getLoopEstimatedTripCount(OrigLoop)`?

Perhaps having a `setLoopEstimatedTripCount(Loop, EstimatedTripCount, EstimatedEntryWeight)` would help fold the identical treatment of UnrolledLoop and RemainderLoop into one function, which also takes care of figuring out the True/False vs. Backedge/Exit directions?


================
Comment at: llvm/lib/Transforms/Utils/LoopUtils.cpp:1066
+  // Calculate number of iterations in unrolled loop.
+  uint64_t UrollAverageTripCount = OrigAverageTripCount / UF;
+  // Calculate number of iterations for remainder loop.
----------------
U[n]rolledAverageTripCount


================
Comment at: llvm/lib/Transforms/Utils/LoopUtils.cpp:1076
+        (UrollAverageTripCount - 1) * OrigLoopEntryWeight;
+    UnrolledLoopEntryWeight = OrigLoopEntryWeight;
+  }
----------------
Seems slightly more logical to first set UnrolledLoopEntryWeight, and then using it set UnrolledLoopBackedgeWeight.


================
Comment at: llvm/lib/Transforms/Utils/LoopUtils.cpp:1085
+        (RemainderAverageTripCount - 1) * OrigLoopEntryWeight;
+    RemainderLoopBackedgeWeight = OrigLoopEntryWeight;
+  }
----------------
ditto


================
Comment at: llvm/lib/Transforms/Utils/LoopUtils.cpp:1098
+    std::swap(RemainderLoopEntryWeight, RemainderLoopBackedgeWeight);
+  // Set new profile metadata.
+  OrigLoopLatchBranch->setMetadata(
----------------
(This actually replaces the old profile metadata with the new one.)


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:118
 #include "llvm/IR/LLVMContext.h"
+#include "llvm/IR/MDBuilder.h"
 #include "llvm/IR/Metadata.h"
----------------
Is this include still needed here?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3456
   cse(LoopVectorBody);
+
+  // For cases like foldTailByMasking() and requiresScalarEpiloque() we may
----------------
Comment below should start with a short sentence explaining that profile weights associated with the original loop are now distributed among the vector and scalar loops.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67905/new/

https://reviews.llvm.org/D67905





More information about the llvm-commits mailing list