[llvm-commits] Static Profile Patch
Dan Gohman
gohman at apple.com
Mon Sep 21 17:05:44 PDT 2009
On Sep 20, 2009, at 8:35 AM, Andrei Alvares wrote:
> The passes should override the Pass::releaseMemory virtual function.
> Some currently have a Clear member function; they should implement
> releaseMemory and have it call Clear, or perhaps Clear should be
> renamed releaseMemory, depending on how it's used.
>
>
> Done. But I would like to know more about this functionality.
> Consider the BranchPreditionPass class. I need to maintain edge
> predictions for subsequent passes (like BlockEdgeFrequencyPass that
> will consume it). If the pass manager calls the
> BranchPredictionPass::releaseMemory, should I free those edge
> predictions too? The attached patch is freeing it when releaseMemory
> is called.
The pass manager won't call releaseMemory until all passes that
depend on the pass have been run, so releaseMemory should release
everything.
>
> BranchHeuristicsInfo::MatchCallHeuristic and some of the others
> don't test whether the terminator is a BranchInst. It looks like
> there's nothing preventing them from analyzing a block with an
> InvokeInst as its terminator. Is that intended by the heuristics?
>
> All the functions that process heuristics (Match{Some Heuristic}
> Heuristic) have a private visibility. The MatchHeuristic is a
> wrapper for those functions. There is a comment in the
> MatchHeuristic header claiming that it expects a BasicBlock with
> exactly two successors. The heuristics are only capable to handle
> blocks with two successors.
>
> I haven't add a check inside MatchHeuristic because I'm already
> ensuring this condition before it is called (on
> BranchPredictionPass). I was trying to avoid redundant checks. Do
> you believe that it should have the test anyway (inside
> BranchHeuristic)?
No, don't insert redundant checks. An assertion might be useful
though.
>
> BranchHeuristicsInfo::postDominates calls
> PostDominatorTree::properlyDominates. Since the distinction between
> domination and proper domination is often both subtle and critical
> for algorithm correctness, please either rename this function or add
> comments or both.
>
>
> The algorithm requires post domination rather than properly post
> dominates. Take the call heuristic for example. If a block contains
> a call and a loop to itself, the heuristic should match. Properly
> post domination miss this and other cases.
>
> DominatorTree have "dominates" and "properlyDominates" functions.
> How come PostDominatorTree implemented only properlyDominates?
> That's way I felt the need to created my own.
I believe this is just because no one has needed it. Feel free
to add it. It should be straight-forward; see DominatorTree's
dominates function for example. The main work is done by the
DominatorTreeBase class.
>
> BranchPredictionInfo::FindBackEdges does a DominatorTree query for
> every edge in the function. It would be possible to get a close
> approximation of this using LoopInfo. That would miss unnatural
> loops and spaghetti code though; is that the reason for doing the
> full traversal? Please add a comment about this.
>
>
> That's an interesting question. At first I thought that it would be
> possible, but after I've implemented the back edges considering only
> LoopInfo, the profiler shown a very low hit rate (20% worst in
> average). I've maintained the back edge calculation with domination
> tree.
That's surprising. It would be worth documenting this somewhere,
perhaps in a comment.
>
> BranchPredictionInfo::FindExitEdges seems to be redundant with
> Loop::getExitEdges.
>
>
> Thanks for the tip. Unfortunately, I've tried to use the
> LoopInfo::getExitEdge in the branch prediction pass, but it is
> crashing. I can't tell for sure the root cause of the problem, but I
> believe that this function required that the loops are in simplified
> form, which is not what happens in some situations.
Hmm. A comment about this would be appropriate as well.
> In its current form, BranchPredictionPass won't be usable in LLVM's
> default optimization pipeline because it requires post-dominator tree
> construction. None of LLVM's current default optimization passes
> require post-dominator trees, so constructing it just for this pass
> will likely be too slow. It could possibly be justified for -O3,
> if there are sufficient benefits to oughtweigh the costs. There are
> many LLVM users that don't use the standard -O2 or -O3 passes
> also, of course.
>
>
> Unfortunately, four of the nine implemented heuristics require post
> dominate tree information. The profiler would be very imprecise if
> those heuristics were lacking. Maybe when LLVM have more profile-
> guided passes, the benefits will out weight its costs.
Ok.
Dan
More information about the llvm-commits
mailing list