[PATCH] D103289: A post-processing for BFI inference

Thu Jun 10 09:47:30 PDT 2021

hoy accepted this revision.
hoy added a comment.
This revision is now accepted and ready to land.

LGTM. Please wait to see if @davidxl and @wenlei has additional comments.

================
Comment at: llvm/include/llvm/Analysis/BlockFrequencyInfoImpl.h:1495
+    if (Change > Threshold) {
+      ActiveSet.push(I);
+      IsActive[I] = true;
----------------
spupyrev wrote:
> hoy wrote:
> > Wondering if it makes sense to not set `I` active. When `I` gets an noticeable update on its counts, its successors should be reprocessed thus they should be set active. But not sure `I` itself should be reprocessed.
> This is a very good question, thanks! 
> 
> I had exactly the same feeling and tried to modify this part as suggested. Unfortunately, it does result in (significantly) slower convergence in some instances, while not providing noticeable benefits. I don't have a rigorous explanation (the alg is a heuristic anyway), but here is my intuition:
> 
> We update frequencies of blocks in some order, which is dictated by `ActiveSet` (currently that's simply a queue). This order does affect the speed of convergence: For example, we want to prioritize updates of frequencies of blocks that are a part of a hot loop. If at an iteration we modify frequency of `I`, then there is a higher chance that block `I` will need to be updated later. Thus, we explicitly add it to the queue so that it's updated again as soon as possible.
> 
> There are likely alternative strategies here, e.g., having some priority-based queues and/or smarter strategy for deciding when `I` needs to be updated. I played quite a bit with various versions but couldn't get significant wins over the default (simplest) strategy. So let's keep this question as a future work.
Thanks for the explanation. Looks like the processing order matters here but hard to track the exact order. Sounds good to keep the current implementation.

================
Comment at: llvm/include/llvm/Analysis/BlockFrequencyInfoImpl.h:1595
+      // Ignore parallel edges between BB and SI blocks
+      if (UniqueSuccs.find(SI) != UniqueSuccs.end())
+        continue;
----------------
spupyrev wrote:
> hoy wrote:
> > Should the probability of parallel edges be accumulated?
> In my tests, I see parallel edges are always coming with exactly the same probability, and their sum might exceed 1.0. I guess that's an assumption/invariant used in BPI. 
You're right. `getEdgeProbability` returns the sum of all raw edge probabilities from Src to Dst.

```
/// Get the raw edge probability calculated for the block pair. **This returns the
/// sum of all raw edge probabilities from Src to Dst.**
BranchProbability
BranchProbabilityInfo::getEdgeProbability(const BasicBlock *Src,
                                          const BasicBlock *Dst) const {
  if (!Probs.count(std::make_pair(Src, 0)))
    return BranchProbability(llvm::count(successors(Src), Dst), succ_size(Src));

  auto Prob = BranchProbability::getZero();
  for (const_succ_iterator I = succ_begin(Src), E = succ_end(Src); I != E; ++I)
    if (*I == Dst)
      Prob += Probs.find(std::make_pair(Src, I.getSuccessorIndex()))->second;

  return Prob;
}
```

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103289/new/

https://reviews.llvm.org/D103289