[PATCH] D103289: A post-processing for BFI inference

Sat Jun 5 12:11:35 PDT 2021

spupyrev added inline comments.

================
Comment at: llvm/include/llvm/Analysis/BlockFrequencyInfoImpl.h:1489
+    }
+    if (OneMinusSelfProb != Scaled64::getOne())
+      NewFreq /= OneMinusSelfProb;
----------------
davidxl wrote:
> spupyrev wrote:
> > davidxl wrote:
> > > spupyrev wrote:
> > > > davidxl wrote:
> > > > > spupyrev wrote:
> > > > > > davidxl wrote:
> > > > > > > spupyrev wrote:
> > > > > > > > davidxl wrote:
> > > > > > > > > Does it apply to other backedges too?
> > > > > > > > not sure I fully understand the question, but we need an adjustment only for self-edges; blocks without self-edges don't need any post-processing
> > > > > > > > 
> > > > > > > > I added a short comment before the loop
> > > > > > > NewFreq /= OneMinusSelfProb looks like multiply the block freq (one iteration loop) with the average trip count -- that is why I asked if this applies to other backedges.
> > > > > > Here is the relevant math:
> > > > > > 
> > > > > > we want to find a new frequency for block I, `Freq[I]`, such that it is equal to `\sum Freq[J] * Prob[J][I]`, where the sum is taken over all (incoming) jumps (J -> I). These are "ideal" frequencies that BFI is trying to compute.
> > > > > > 
> > > > > > Clearly if I-th block has no self-edges, then we simply assign `Freq[I]:=\sum Freq[J] * Prob[J][I]` (that is, no adjustment). However, if there are self_edges, we need to assign `Freq[I]:=(\sum Freq[J] * Prob[J][I]) / (1 - Prob[I][I])` (the adjustment in the code)
> > > > > I wonder why the special treatment is needed in the first place.
> > > > > 
> > > > > Suppose we have 
> > > > > 
> > > > >  ```
> > > > >  BB1  (init freq = 50)
> > > > >      |
> > > > >      V  <-----------------
> > > > >     BB2 (int freq = 0)   |
> > > > >     /      \ 90%              |
> > > > >    / 10%\____________|
> > > > > <
> > > > > 
> > > > > ```
> > > > > 
> > > > > With iterative fixup, BB2's frequency will converge to 500, which is the right value without any special handling.
> > > > Excellent example!
> > > > 
> > > > The correct inference here is `Freq[BB1] = 50, Freq[BB2] = 500`, which is found after 5 iterations using the diff. If we remove the self-edge adjustment, we don't get the right result: it converges to `Freq[BB1] = 50, Freq[BB2] = 50` after ~100 iterations. (Observe that we do modify the frequency of the entry block, it is not fixed)
> > > > 
> > > > In general, I do not have a proof that the Markov chain always converges to the desired stationary point, if we incorrectly update frequencies (e.g., w/o the self-edge adjustment) -- I suspect it does not.
> > > By entry frequency, do you mean BB1's frequency? BB1 won't be active after the first iteration right?
> > Yes I meant BB1's frequency.
> > 
> > Notice that in order to create a valid Markov chain, we need to add jumps from all exists to the entry. In this case, from BB2 to BB1. So BB1 will be active on later iterations
> Can you verify if it still works without the adjustment: in the small example, split BB2 into two BBs.
I've commented above:

> If we remove the self-edge adjustment, we don't get the right result: it converges to Freq[BB1] = 50, Freq[BB2] = 50 after ~100 iterations. 
> In general, I do not have a proof that the Markov chain always converges to the desired stationary point, if we incorrectly update frequencies (e.g., w/o the self-edge adjustment) -- I suspect it does not.

What is the concern/question here? In my mind, this is not a "fix/hack" but the correct way of applying iterative inference.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103289/new/

https://reviews.llvm.org/D103289