[llvm-commits] Static Profile Patch

Mon Sep 21 17:05:44 PDT 2009

On Sep 20, 2009, at 8:35 AM, Andrei Alvares wrote:

> The passes should override the Pass::releaseMemory virtual function.
> Some currently have a Clear member function; they should implement
> releaseMemory and have it call Clear, or perhaps Clear should be
> renamed releaseMemory, depending on how it's used.
>
>
> Done. But I would like to know more about this functionality.  
> Consider the BranchPreditionPass class. I need to maintain edge  
> predictions for subsequent passes (like BlockEdgeFrequencyPass that  
> will consume it). If the pass manager calls the  
> BranchPredictionPass::releaseMemory, should I free those edge  
> predictions too? The attached patch is freeing it when releaseMemory  
> is called.

The pass manager won't call releaseMemory until all passes that
depend on the pass have been run, so releaseMemory should release
everything.

>
> BranchHeuristicsInfo::MatchCallHeuristic and some of the others
> don't test whether the terminator is a BranchInst. It looks like
> there's nothing preventing them from analyzing a block with an
> InvokeInst as its terminator. Is that intended by the heuristics?
>
> All the functions that process heuristics (Match{Some Heuristic} 
> Heuristic) have a private visibility. The MatchHeuristic is a  
> wrapper for those functions. There is a comment in the  
> MatchHeuristic header claiming that it expects a BasicBlock with  
> exactly two successors. The heuristics are only capable to handle  
> blocks with two successors.
>
> I haven't add a check inside MatchHeuristic because I'm already  
> ensuring this condition before it is called (on  
> BranchPredictionPass). I was trying to avoid redundant checks. Do  
> you believe that it should have the test anyway (inside  
> BranchHeuristic)?

No, don't insert redundant checks. An assertion might be useful
though.

>
> BranchHeuristicsInfo::postDominates calls
> PostDominatorTree::properlyDominates. Since the distinction between
> domination and proper domination is often both subtle and critical
> for algorithm correctness, please either rename this function or add
> comments or both.
>
>
> The algorithm requires post domination rather than properly post  
> dominates. Take the call heuristic for example. If a block contains  
> a call and a loop to itself, the heuristic should match. Properly  
> post domination miss this and other cases.
>
> DominatorTree have "dominates" and "properlyDominates" functions.  
> How come PostDominatorTree implemented only properlyDominates?  
> That's way I felt the need to created my own.

I believe this is just because no one has needed it. Feel free
to add it. It should be straight-forward; see DominatorTree's
dominates function for example. The main work is done by the
DominatorTreeBase class.

>
> BranchPredictionInfo::FindBackEdges does a DominatorTree query for
> every edge in the function. It would be possible to get a close
> approximation of this using LoopInfo. That would miss unnatural
> loops and spaghetti code though; is that the reason for doing the
> full traversal? Please add a comment about this.
>
>
> That's an interesting question. At first I thought that it would be  
> possible, but after I've implemented the back edges considering only  
> LoopInfo, the profiler shown a very low hit rate (20% worst in  
> average). I've maintained the back edge calculation with domination  
> tree.

That's surprising. It would be worth documenting this somewhere,
perhaps in a comment.

>
> BranchPredictionInfo::FindExitEdges seems to be redundant with
> Loop::getExitEdges.
>
>
> Thanks for the tip. Unfortunately, I've tried to use the  
> LoopInfo::getExitEdge in the branch prediction pass, but it is  
> crashing. I can't tell for sure the root cause of the problem, but I  
> believe that this function required that the loops are in simplified  
> form, which is not what happens in some situations.

Hmm. A comment about this would be appropriate as well.

> In its current form, BranchPredictionPass won't be usable in LLVM's
> default optimization pipeline because it requires post-dominator tree
> construction. None of LLVM's current default optimization passes
> require post-dominator trees, so constructing it just for this pass
> will likely be too slow. It could possibly be justified for -O3,
> if there are sufficient benefits to oughtweigh the costs. There are
> many LLVM users that don't use the standard -O2 or -O3 passes
> also, of course.
>
>
> Unfortunately, four of the nine implemented heuristics require post  
> dominate tree information. The profiler would be very imprecise if  
> those heuristics were lacking. Maybe when LLVM have more profile- 
> guided passes, the benefits will out weight its costs.

Ok.

Dan