[PATCH] D103289: A post-processing for BFI inference

Sat Jun 5 11:34:03 PDT 2021

davidxl added inline comments.

================
Comment at: llvm/include/llvm/Analysis/BlockFrequencyInfoImpl.h:1489
+    }
+    if (OneMinusSelfProb != Scaled64::getOne())
+      NewFreq /= OneMinusSelfProb;
----------------
spupyrev wrote:
> davidxl wrote:
> > spupyrev wrote:
> > > davidxl wrote:
> > > > spupyrev wrote:
> > > > > davidxl wrote:
> > > > > > spupyrev wrote:
> > > > > > > davidxl wrote:
> > > > > > > > Does it apply to other backedges too?
> > > > > > > not sure I fully understand the question, but we need an adjustment only for self-edges; blocks without self-edges don't need any post-processing
> > > > > > > 
> > > > > > > I added a short comment before the loop
> > > > > > NewFreq /= OneMinusSelfProb looks like multiply the block freq (one iteration loop) with the average trip count -- that is why I asked if this applies to other backedges.
> > > > > Here is the relevant math:
> > > > > 
> > > > > we want to find a new frequency for block I, `Freq[I]`, such that it is equal to `\sum Freq[J] * Prob[J][I]`, where the sum is taken over all (incoming) jumps (J -> I). These are "ideal" frequencies that BFI is trying to compute.
> > > > > 
> > > > > Clearly if I-th block has no self-edges, then we simply assign `Freq[I]:=\sum Freq[J] * Prob[J][I]` (that is, no adjustment). However, if there are self_edges, we need to assign `Freq[I]:=(\sum Freq[J] * Prob[J][I]) / (1 - Prob[I][I])` (the adjustment in the code)
> > > > I wonder why the special treatment is needed in the first place.
> > > > 
> > > > Suppose we have 
> > > > 
> > > >  ```
> > > >  BB1  (init freq = 50)
> > > >      |
> > > >      V  <-----------------
> > > >     BB2 (int freq = 0)   |
> > > >     /      \ 90%              |
> > > >    / 10%\____________|
> > > > <
> > > > 
> > > > ```
> > > > 
> > > > With iterative fixup, BB2's frequency will converge to 500, which is the right value without any special handling.
> > > Excellent example!
> > > 
> > > The correct inference here is `Freq[BB1] = 50, Freq[BB2] = 500`, which is found after 5 iterations using the diff. If we remove the self-edge adjustment, we don't get the right result: it converges to `Freq[BB1] = 50, Freq[BB2] = 50` after ~100 iterations. (Observe that we do modify the frequency of the entry block, it is not fixed)
> > > 
> > > In general, I do not have a proof that the Markov chain always converges to the desired stationary point, if we incorrectly update frequencies (e.g., w/o the self-edge adjustment) -- I suspect it does not.
> > By entry frequency, do you mean BB1's frequency? BB1 won't be active after the first iteration right?
> Yes I meant BB1's frequency.
> 
> Notice that in order to create a valid Markov chain, we need to add jumps from all exists to the entry. In this case, from BB2 to BB1. So BB1 will be active on later iterations
Can you verify if it still works without the adjustment: in the small example, split BB2 into two BBs.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103289/new/

https://reviews.llvm.org/D103289