[PATCH] D79485: [BPI] Improve static heuristics for "cold" paths.

Tue Dec 22 03:07:43 PST 2020

ebrevnov added inline comments.

================
Comment at: llvm/lib/Analysis/BranchProbabilityInfo.cpp:116
+/// Weight to an 'unreachable' block.
+static const uint32_t UNREACHABLE_WEIGHT = ZERO_WEIGHT;
+
----------------
yrouban wrote:
> static_assert(UNREACHABLE_WEIGHT< VERY_LOW_WEIGHT)?
Why? Currently, this assert holds but there is nothing preventing us from increasing UNREACHABLE_WEIGHT if needed by the cost model.

================
Comment at: llvm/lib/Analysis/BranchProbabilityInfo.cpp:131
+/// COLD_WEIGHT have meaning relatively to each other and DEFAULT_WEIGHT only.
+static const uint32_t DEFAULT_WEIGHT = 0xfffff;

----------------
yrouban wrote:
> I suggest that we introduce the UNKNOWN_WEIGHT for most of the DEFAULT_WEIGHT uses to better reflect its meaning and the way it is treated.
> DEFAULT_WEIGHT should be used only for UNKNOWN_WEIGHT at the very last step of BPI calculation (after propagation).
> 
Which uses are you talking about. I see three uses in calcEstimatedHeuristics only

================
Comment at: llvm/lib/Analysis/BranchProbabilityInfo.cpp:628
+  SmallVector<uint32_t, 4> Weights;
+  Optional<uint32_t> MaxWeight;
+  for (const BasicBlock *DstBB : Successors) {
----------------
yrouban wrote:
> I would start with the VERY_LOW_WEIGHT. That is if there is no loop exits then the loop is infinite, that is it can be entered at most once.
This method does nothing special to loop exits. It just iterates over all edges and returns maximum weight. If there are no edges it signals None.

================
Comment at: llvm/lib/Analysis/BranchProbabilityInfo.cpp:690
+
+  for (const auto *DTNode = DTStartNode; DTNode != nullptr;
+       DTNode = DTNode->getIDom()) {
----------------
yrouban wrote:
> It seems we have to run a similar loop for the PostDom tree.
Didn't get.

================
Comment at: llvm/lib/Analysis/BranchProbabilityInfo.cpp:702
+    // Don't propagate weight to blocks belonging to different loops.
+    if (!isLoopEnteringExitingEdge(Edge)) {
+      if (!updateEstimatedBlockWeight(DomLoopBB, BBWeight, WorkList,
----------------
yrouban wrote:
> 1. if Edge is a loop exiting edge then we can set weight of the loop of DomBB as it lays on the same dom-postdom line with BB. Then we do not need to add the loop to the worklist but have to add its entering blocks to BlockWorkList.
> 2. Avoid checking for loop exiting condition twice:
> ```
>    if (isLoopExitingEdge(Edge)) // Check before isLoopEntering() as it might be also true.
>      LoopWorkList.push_back(DomLoopBB);
>    if (isLoopEnteringEdge(Edge))
>      ; // Nothing todo. Why?
>    else if (!updateEstimatedBlockWeight(...))
>      break;
> ```
> Please comment that we do not update block weights for blocks that are not in the same loop with BB. Instead we update their loop weights.
I framed the code this way to explicitly emphasize loop and non-loop cases. The compiler is clever enough to do the suggested optimization behind the scence.

================
Comment at: llvm/lib/Analysis/BranchProbabilityInfo.cpp:703
+    if (!isLoopEnteringExitingEdge(Edge)) {
+      if (!updateEstimatedBlockWeight(DomLoopBB, BBWeight, WorkList,
+                                      LoopWorkList))
----------------
yrouban wrote:
> could it result in a deep recursion? may be a worklist would be better?
Recursion won't happen as first thing updateEstimatedWeight does is checking if we have already processed this block.

================
Comment at: llvm/lib/Analysis/BranchProbabilityInfo.cpp:758
+    if (!BBWeight)
+      for (const auto *Pred : predecessors(BB))
+        if (Pred)
----------------
yrouban wrote:
> for the //unwind// property we do not need to iterate over all predecessors.
> if block has one unwind predecessors then all its predecessors must be unwind (that is because the block must start with a landing pad). I would suggest to commit this and check only one of the predecessors.
Then it will always stop on the first predecessor, right? May unwind have more than one predecessor at all?

================
Comment at: llvm/lib/Analysis/BranchProbabilityInfo.cpp:759
+      for (const auto *Pred : predecessors(BB))
+        if (Pred)
+          if (const auto *II = dyn_cast<InvokeInst>(Pred->getTerminator()))
----------------
yrouban wrote:
> not needed?
Not sure what was the case, but I came across the case when Pred was null.

================
Comment at: llvm/lib/Analysis/BranchProbabilityInfo.cpp:787
+  // successor/exit having estimated weight. Try to propagate weight to such
+  // blocks/loops from successors/exits.
+  while (!BlockWorkList.empty() || !LoopWorkList.empty()) {
----------------
yrouban wrote:
> .. BlockWorkList ...
> LoopWorkList has a priority over BlockWorklist. Why?
There should be no difference which one processed first. Someone has to be first :-)

================
Comment at: llvm/lib/Analysis/BranchProbabilityInfo.cpp:793
+
+      if (EstimatedLoopWeight.count(LoopBB.getLoopData()))
+        continue;
----------------
yrouban wrote:
> why should we avoid calculating the loop weight? what if we find another weight for this loop?
In general, weight is assigned to a block/loop when it has final value and can't/shouldn't be changed.  However, there are cases when a block/loop inherently has several (possibly "contradicting") weights. For example, "unwind" block may also contain "cold" call. In that case the first set weight is favored and all consequent weights are ignored.

================
Comment at: llvm/lib/Analysis/BranchProbabilityInfo.cpp:643
+
+bool BranchProbabilityInfo::updateEstimatedBlockWeight(
+    LoopBlock &LoopBB, uint32_t BBWeight,
----------------
yrouban wrote:
> ebrevnov wrote:
> > ebrevnov wrote:
> > > davidxl wrote:
> > > > document the method and params.
> > > sure
> > > sure
> > 
> > In fact, there is a documentation in header file. In source file I can put more details on behavior in case of rewrite attempt as you requested bellow. 
> I would remove this doc from the definition (to not deviate from the doc at the declaration) and put the note inside the body.
Documentation at declaration and definition provides different level of details. They repeat each other a bit but differs in many ways.

================
Comment at: llvm/lib/Analysis/BranchProbabilityInfo.cpp:651
+      EstimatedBlockWeight.insert({BB, BBWeight});
+  if (IsInserted)
+    for (pred_iterator PI = pred_begin(BB), E = pred_end(BB); PI != E; ++PI) {
----------------
yrouban wrote:
> ebrevnov wrote:
> > davidxl wrote:
> > > explain here. Why the early estimated weight takes precedence?
> > will add a comment.
> please explain (what if previous weight is not equal to the new, may be assert(PrevWeight <= BBWeight)?)
We can't assert since single block might have several "incompatible" weights. For example the "unwind" block may have "cold" call. In that case we favor the first one encountered and rely on proper evaluation order.

================
Comment at: llvm/lib/Analysis/BranchProbabilityInfo.cpp:126
+/// with attribute 'cold'.
+static const uint32_t COLD_WEIGHT = 0xffff;
+
----------------
yrouban wrote:
> ebrevnov wrote:
> > davidxl wrote:
> > > ebrevnov wrote:
> > > > davidxl wrote:
> > > > > why is it set to 0xffff?
> > > > Short answer is to preserve same branch probability as before.
> > > > Old implementation assigned CC_NONTAKEN_WEIGHT weight to "normal" path and  CC_TAKEN_WEIGHT to "cold" path. Absolute values are irrelevant and ratio CC_NONTAKEN_WEIGHT/CC_TAKEN_WEIGHT = 16 defines relative probability of "normal" and "cold" branches. New implementation uses  pre-selected weight for all "nornal" paths DEFAULT_WEIGHT. Thus relative ratio is calculated as DEFAULT_WEIGHT/COLD_WEIGHT = 0xfffff/0xffff = 16.
> > > Why not define a value for DEFAULT_WEIGHT, and define COLD_WEIGHT to be 1/16 of the DEFAULT weight? It is more readable that way.
> > This is just the way how weights were defined before (using direct numbers instead of ratio). Please note all other cases (like ZH_TAKEN_WEIGHT, ZH_NONTAKEN_WEIGHT)  don't use ratio as well. Another theoretical reason could be an ability to represent ratios with non zero remainder.
> 1. group the new related *_WEIGHT into one enum (to distinguish them from the other *_WEIGH constants)
> 2. explicitly describe their values (e.g. why COLD_WEIGHT is X times as high as LOWEST_NONZERO_WEIGHT)
> 3. avoid using hex notation unless it makes sense (e.g. important for bitwise operations)
> Something like the following:
>   enum {
>     ZERO_WEIGHT = 0,
>     UNREACHABLE_WEIGHT = ZERO_WEIGHT,
>     LOWEST_NONZERO_WEIGHT = 1,
>     NORETURN_WEIGHT = LOWEST_NONZERO_WEIGHT,
>     UNWIND_WEIGHT = LOWEST_NONZERO_WEIGHT,
>     COLD_WEIGHT = 65535 * LOWEST_NONZERO_WEIGHT,
>     DEFAULT_WEIGHT = 64 * COLD_WEIGHT
>   };
1. Done
2. This is just a heuristic values. You can't have precise description for them.
3. Done.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79485/new/

https://reviews.llvm.org/D79485