[PATCH] D103289: A post-processing for BFI inference

Wenlei He via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sun Jun 6 00:32:24 PDT 2021


wenlei added inline comments.


================
Comment at: llvm/include/llvm/Analysis/BlockFrequencyInfoImpl.h:1489
+    }
+    if (OneMinusSelfProb != Scaled64::getOne())
+      NewFreq /= OneMinusSelfProb;
----------------
davidxl wrote:
> spupyrev wrote:
> > davidxl wrote:
> > > spupyrev wrote:
> > > > davidxl wrote:
> > > > > spupyrev wrote:
> > > > > > davidxl wrote:
> > > > > > > spupyrev wrote:
> > > > > > > > davidxl wrote:
> > > > > > > > > spupyrev wrote:
> > > > > > > > > > davidxl wrote:
> > > > > > > > > > > Does it apply to other backedges too?
> > > > > > > > > > not sure I fully understand the question, but we need an adjustment only for self-edges; blocks without self-edges don't need any post-processing
> > > > > > > > > > 
> > > > > > > > > > I added a short comment before the loop
> > > > > > > > > NewFreq /= OneMinusSelfProb looks like multiply the block freq (one iteration loop) with the average trip count -- that is why I asked if this applies to other backedges.
> > > > > > > > Here is the relevant math:
> > > > > > > > 
> > > > > > > > we want to find a new frequency for block I, `Freq[I]`, such that it is equal to `\sum Freq[J] * Prob[J][I]`, where the sum is taken over all (incoming) jumps (J -> I). These are "ideal" frequencies that BFI is trying to compute.
> > > > > > > > 
> > > > > > > > Clearly if I-th block has no self-edges, then we simply assign `Freq[I]:=\sum Freq[J] * Prob[J][I]` (that is, no adjustment). However, if there are self_edges, we need to assign `Freq[I]:=(\sum Freq[J] * Prob[J][I]) / (1 - Prob[I][I])` (the adjustment in the code)
> > > > > > > I wonder why the special treatment is needed in the first place.
> > > > > > > 
> > > > > > > Suppose we have 
> > > > > > > 
> > > > > > >  ```
> > > > > > >  BB1  (init freq = 50)
> > > > > > >      |
> > > > > > >      V  <-----------------
> > > > > > >     BB2 (int freq = 0)   |
> > > > > > >     /      \ 90%              |
> > > > > > >    / 10%\____________|
> > > > > > > <
> > > > > > > 
> > > > > > > ```
> > > > > > > 
> > > > > > > With iterative fixup, BB2's frequency will converge to 500, which is the right value without any special handling.
> > > > > > Excellent example!
> > > > > > 
> > > > > > The correct inference here is `Freq[BB1] = 50, Freq[BB2] = 500`, which is found after 5 iterations using the diff. If we remove the self-edge adjustment, we don't get the right result: it converges to `Freq[BB1] = 50, Freq[BB2] = 50` after ~100 iterations. (Observe that we do modify the frequency of the entry block, it is not fixed)
> > > > > > 
> > > > > > In general, I do not have a proof that the Markov chain always converges to the desired stationary point, if we incorrectly update frequencies (e.g., w/o the self-edge adjustment) -- I suspect it does not.
> > > > > By entry frequency, do you mean BB1's frequency? BB1 won't be active after the first iteration right?
> > > > Yes I meant BB1's frequency.
> > > > 
> > > > Notice that in order to create a valid Markov chain, we need to add jumps from all exists to the entry. In this case, from BB2 to BB1. So BB1 will be active on later iterations
> > > Can you verify if it still works without the adjustment: in the small example, split BB2 into two BBs.
> > I've commented above:
> > 
> > > If we remove the self-edge adjustment, we don't get the right result: it converges to Freq[BB1] = 50, Freq[BB2] = 50 after ~100 iterations. 
> > > In general, I do not have a proof that the Markov chain always converges to the desired stationary point, if we incorrectly update frequencies (e.g., w/o the self-edge adjustment) -- I suspect it does not.
> > 
> > What is the concern/question here? In my mind, this is not a "fix/hack" but the correct way of applying iterative inference.
> There is not much concerns and the patch is almost good to go in. Just want to make sure the algo works for all cases.
Does self probability map to damping factor in original page rank? 


================
Comment at: llvm/include/llvm/Analysis/BlockFrequencyInfoImpl.h:1385
+  std::vector<const BlockT *> HotBlocks;
+  findHotBlocks(HotBlocks);
+  if (HotBlocks.empty())
----------------
nit: it looks like this is just finding reachable/live blocks instead of hot blocks, hence the naming could be misleading. 


================
Comment at: llvm/include/llvm/Analysis/BlockFrequencyInfoImpl.h:1388
+  // Extract initial frequencies for the blocks
+  DenseMap<const BlockT *, size_t> BlockIndex;
+  auto Freq = std::vector<Scaled64>(HotBlocks.size());
----------------
spupyrev wrote:
> davidxl wrote:
> > why is this map needed (which adds a layer of indirection)?
> The map is used to index successors/predecessors of "hot" blocks, see line 1603.
> 
> As an optimization, we don't process all the blocks in a function but only those that can be reached from the entry via branches with a positive probability. These are `HotBlocks` in the code. Typically, the number of HotBlocks is 2x-5x smaller than the total number of blocks in the function. In order to find an index of a block within the list, we either need to do a linear scan over HotBlocks, or have such an extra map.
I think we could avoid having the index and extra map if the ProbMatrix (and other data structure) use block pointer instead of index as block identifier, and still remove cold blocks in the processing - i.e. replacing various vectors with map<BasicBlock*, ..>. I think that may be slightly more readable, but using index as identifier is closer to the underlying math.. Either way is fine to me.


================
Comment at: llvm/lib/Analysis/BlockFrequencyInfoImpl.cpp:60
+cl::opt<double> IterativeBFIThreshold(
+    "iterative-bfi-threshold", cl::init(1e-12), cl::Hidden,
+    cl::desc("Iterative inference: delta convergence threshold"));
----------------
perhaps `iterative-bfi-precision` or something alike is more reflective of what it does?

It'd be helpful to mention somewhere in the comment or description the trade off between precision and run time (iterations needed to converge). 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103289/new/

https://reviews.llvm.org/D103289



More information about the llvm-commits mailing list