[PATCH] D79485: [BPI] Improve static heuristics for "cold" paths.

Tue Aug 11 23:30:41 PDT 2020

ebrevnov added a comment.

In D79485#2208913 <https://reviews.llvm.org/D79485#2208913>, @davidxl wrote:

> A lot of test changes can probably be extracted out as NFC (to make the string test more robust) -- this will reduce the size of the patch.

Unfortunately, in most cases, I don't see how this can be done with out loosing current coverage. There are few cases like 'successors: %bb.3(0x80000000), %bb.4(0x00000000)' where we could remove exact numbers and just check block names. I think we better keep exact probabilities in all other cases. That actually helps to catch unintended changes.

> Do you have performance numbers related to the change?

I measured performance on bunch of java related benchmarks we have in house including SPECJVM2008, SPECJbb2015, DaCapo9.12 and others. I don't see any noticeable impact on performance.

================
Comment at: llvm/lib/Analysis/BranchProbabilityInfo.cpp:126
+/// with attribute 'cold'.
+static const uint32_t COLD_WEIGHT = 0xffff;
+
----------------
davidxl wrote:
> ebrevnov wrote:
> > davidxl wrote:
> > > why is it set to 0xffff?
> > Short answer is to preserve same branch probability as before.
> > Old implementation assigned CC_NONTAKEN_WEIGHT weight to "normal" path and  CC_TAKEN_WEIGHT to "cold" path. Absolute values are irrelevant and ratio CC_NONTAKEN_WEIGHT/CC_TAKEN_WEIGHT = 16 defines relative probability of "normal" and "cold" branches. New implementation uses  pre-selected weight for all "nornal" paths DEFAULT_WEIGHT. Thus relative ratio is calculated as DEFAULT_WEIGHT/COLD_WEIGHT = 0xfffff/0xffff = 16.
> Why not define a value for DEFAULT_WEIGHT, and define COLD_WEIGHT to be 1/16 of the DEFAULT weight? It is more readable that way.
This is just the way how weights were defined before (using direct numbers instead of ratio). Please note all other cases (like ZH_TAKEN_WEIGHT, ZH_NONTAKEN_WEIGHT)  don't use ratio as well. Another theoretical reason could be an ability to represent ratios with non zero remainder.

================
Comment at: llvm/lib/Analysis/BranchProbabilityInfo.cpp:643
+
+bool BranchProbabilityInfo::updateEstimatedBlockWeight(
+    LoopBlock &LoopBB, uint32_t BBWeight,
----------------
davidxl wrote:
> document the method and params.
sure

================
Comment at: llvm/lib/Analysis/BranchProbabilityInfo.cpp:651
+      EstimatedBlockWeight.insert({BB, BBWeight});
+  if (IsInserted)
+    for (pred_iterator PI = pred_begin(BB), E = pred_end(BB); PI != E; ++PI) {
----------------
davidxl wrote:
> explain here. Why the early estimated weight takes precedence?
will add a comment.

================
Comment at: llvm/lib/Analysis/BranchProbabilityInfo.cpp:731
+  // By doing RPO we make sure that all predecessors already have weights
+  // calculated before visiting theirs successors.
+  ReversePostOrderTraversal<const Function *> RPOT(&F);
----------------
davidxl wrote:
> why doing RPO walk? the algorithm does not seem to depend on this order.
You are right and it should not affect correctness but could cause the algorithm to do more iterations. 'While' loop following this one propagates weights from successors to predecessors. If any of the successors not yet evaluated it will trigger an additional iteration when get evaluated. That means it's preferred to evaluate successors before predecessors.

================
Comment at: llvm/test/Analysis/BlockFrequencyInfo/redundant_edges.ll:12

-; CHECK-NEXT: loop: float = 32.0
+; CHECK-NEXT: loop: float = 16.5
 loop:
----------------
davidxl wrote:
> why is the new estimation better?
This is tough question. Let's take an extreme case when loop has infinite number of exits. In this case probability to take back branch should go to zero and loop frequency should be 1.0. That means more exits we have less probable to got to the next iteration.

Existing algorithm completely ignores number of exits and always assumes loop makes 32 iteration. New algorithm assumes that back branch weight is 32 times higher than default exiting weight. Thus in case of one loop exit both algorithms give the following probability of the back branch:

edge loop -> loop probability is 0x7c000000 / 0x80000000 = 96.88% [HOT edge]

But because in this case there are two exiting edges probability of the back branch is calculated as '32*DW/(32*DW+2*DW)==32/34=0.94"

edge loop -> loop probability is 0x783e0f84 / 0x80000000 = 93.94%

I think this is pretty reasonable probability much better matching general high level picture.

There is one cavity though. Please note in this example we have two exiting branches from latch. If we had side exit from another block even new algorithm won't take it into account. I think we should put a TODO on that and follow up later. What do you think?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79485/new/

https://reviews.llvm.org/D79485