[llvm-commits] Static Profile Patch

Sun Sep 20 08:35:15 PDT 2009

Hi Dan,
  Thanks for your valuable comments. Here it follows what I've done and not.
Sorry for taking so long to reply, I was testing your recommendations.
  Attachment: stprof-19.09.09.patch

On Mon, Sep 14, 2009 at 12:54 PM, Dan Gohman <gohman at apple.com> wrote:

> Hi Andrei,
>
> Here are some review comments for this patch.  I think it would be
> fine to check into the tree at this point, so that future changes can
> be reviewed incrementally instead of with monolithic patches.
>
>
I agree. It would be great.

> BranchPredictionPass has
>    double getEdgeProbability(Edge &edge) const;
> as a public member function, but Edge is a private typedef, so one of
> these has the wrong visibility.
>
>
Fixed. The typedef for edge have a public visibility for
BranchPredictionPass and BlockEdgeFrequencyPass now.

> The passes should override the Pass::releaseMemory virtual function.
> Some currently have a Clear member function; they should implement
> releaseMemory and have it call Clear, or perhaps Clear should be
> renamed releaseMemory, depending on how it's used.
>
>
Done. But I would like to know more about this functionality. Consider the
BranchPreditionPass class. I need to maintain edge predictions for
subsequent passes (like BlockEdgeFrequencyPass that will consume it). If the
pass manager calls the BranchPredictionPass::releaseMemory, should I free
those edge predictions too? The attached patch is freeing it when
releaseMemory is called.

> lib/Analysis/BranchPredictionPass.cpp has
> #include "llvm/Transforms/Utils/BasicBlockUtils.h"
> which is a red flag because lib/Analysis code is not allowed to
> depend on lib/Transforms code. However, this #include appears to be
> unneeded, so it should be removed.
>
>
I tried llvm::FindFunctionBackedges (BasicBlockUtils), but since I was
already building the DominatorTree I decided to take advantage of it and
calculate back edges using domination. Sorry for leaving this unused include
file.

 BranchHeuristicsInfo::MatchCallHeuristic and some of the others
> don't test whether the terminator is a BranchInst. It looks like
> there's nothing preventing them from analyzing a block with an
> InvokeInst as its terminator. Is that intended by the heuristics?
>

All the functions that process heuristics (Match{Some Heuristic}Heuristic)
have a private visibility. The MatchHeuristic is a wrapper for those
functions. There is a comment in the MatchHeuristic header claiming that it
expects a BasicBlock with exactly two successors. The heuristics are only
capable to handle blocks with two successors.

I haven't add a check inside MatchHeuristic because I'm already ensuring
this condition before it is called (on BranchPredictionPass). I was trying
to avoid redundant checks. Do you believe that it should have the test
anyway (inside BranchHeuristic)?

> BranchHeuristicsInfo::postDominates calls
> PostDominatorTree::properlyDominates. Since the distinction between
> domination and proper domination is often both subtle and critical
> for algorithm correctness, please either rename this function or add
> comments or both.
>
>
The algorithm requires post domination rather than properly post dominates.
Take the call heuristic for example. If a block contains a call and a loop
to itself, the heuristic should match. Properly post domination miss this
and other cases.

DominatorTree have "dominates" and "properlyDominates" functions. How come
PostDominatorTree implemented only properlyDominates? That's way I felt the
need to created my own.

 BranchPredictionInfo::FindBackEdges does a DominatorTree query for
> every edge in the function. It would be possible to get a close
> approximation of this using LoopInfo. That would miss unnatural
> loops and spaghetti code though; is that the reason for doing the
> full traversal? Please add a comment about this.
>
>
That's an interesting question. At first I thought that it would be
possible, but after I've implemented the back edges considering only
LoopInfo, the profiler shown a very low hit rate (20% worst in average).
I've maintained the back edge calculation with domination tree.

> BranchPredictionInfo::FindExitEdges seems to be redundant with
> Loop::getExitEdges.
>
>
Thanks for the tip. Unfortunately, I've tried to use the
LoopInfo::getExitEdge in the branch prediction pass, but it is crashing. I
can't tell for sure the root cause of the problem, but I believe that this
function required that the loops are in simplified form, which is not what
happens in some situations.

Could BranchPredictionInfo::FindStores and FindCalls be combined
> to avoid doing so many function traversals?
>

I thought about this at first, but those functions have different semantics.
Put them together would require the treatment of a lot of "if"s cases, thus
making the code less elegant and without the proper benefits. If you insist
I can merge them.

>
> In its current form, BranchPredictionPass won't be usable in LLVM's
> default optimization pipeline because it requires post-dominator tree
> construction. None of LLVM's current default optimization passes
> require post-dominator trees, so constructing it just for this pass
> will likely be too slow. It could possibly be justified for -O3,
> if there are sufficient benefits to oughtweigh the costs. There are
> many LLVM users that don't use the standard -O2 or -O3 passes
> also, of course.
>
>
Unfortunately, four of the nine implemented heuristics require post dominate
tree information. The profiler would be very imprecise if those heuristics
were lacking. Maybe when LLVM have more profile-guided passes, the benefits
will out weight its costs.

  Thanks,
    Andrei

> Dan
>
>
> On Sep 11, 2009, at 6:46 PM, Andrei Alvares wrote:
>
>  Hello everyone,
>>  Since this patch hasn't been applied yet, I'm taking the liberty to send
>> a more updated version. I made several changes in the branch predictor
>> (BranchPredictionPass) and in the intra procedural static profiler
>> (BlockEdgeFrequencyPass).
>>  Attachment: stprof-11.09.09.patch.
>>
>>  While I was debbuging the code, I've noticed that I was maintaining data
>> across multiple calls of runOnFunction on both passes, which was indeed an
>> undesirable behavior. After cleaning up, both passes shown improved results.
>> In fact, the branch predictor produced very close related prediction as
>> Ball's predictor. I ran the branch predictor on the SPEC 2000 (int and
>> float). I'm attaching "heuristics.txt" which compared results separated by
>> heuristic for both predictors. While most heuristics predicted more
>> accurately, the call and opcode heuristics shown worst results. This means
>> that is still space for improvements. Also, I've found a bug in the way the
>> predictor threats a branch that have some backedge sucessors, but not all. I
>> expected that the other branches were always exit edges, but there are cases
>> that this situation is not true.
>>
>>  Moreover, while calculating blocks and edge frequencies it is possible to
>> verify if it is calculating correct frequency information. Since the entry
>> block frequency is always one, is expected that the exit's total frequency
>> is also one. So, all we need to do is check the sum of all predecessors
>> edges of exit's basic blocks to match one. However, I've found two cases of
>> miscalculation by the pass: (1) when the control flow graph is not
>> reducible; (2) when seems to be a loop that does not terminates (has no exit
>> blocks). Although this pass can calculate frequencies for those situations,
>> it might not be accurate. Nevertheless, seems like the pass is doing what is
>> supposed to.
>>
>>  The global static profiler (inter procedural) is not as accurate as Wu's
>> paper yet. But after those fixes, it has improved quite significantly. I'm
>> attaching the results of the profiler comparing the correct prediction rate
>> of the top most executed blocks, edges, and function call invocations
>> (ranging from 10% to 50%). Attachments: block.txt, edge.txt and call.txt.
>>
>>  Thanks,
>>    Andrei
>>
>> On Wed, Sep 2, 2009 at 3:37 PM, Andrei Alvares <logytech at gmail.com>
>> wrote:
>> Hello everyone,
>>  Here it follows the static profiler implementation, developed as
>> part of my google summer of code project. It performs branch
>> predictions in compilation time, i.e., assign probabilities to branch
>> outcomes using a set of predefined heuristics. Also, it calculates
>> intra and interprocedural profiling by staticly estimate basic blocks
>> and edges frequencies (local and global) and function call invocations
>> frequencies.
>>  Attachment: stprof-02.02.09.patch
>>
>>  I've run the static profiler on some of the SPECint 2000 programs
>> (those that I was able to compile and run).  It has not yet achieved
>> the accuracy found in Wu's paper, but I believe it can still be
>> improved.
>>  Best regards,
>>   Andrei
>>
>> Youfeng Wu and James R. Larus. Static branch frequency and program
>> profile analysis. In MICRO 27: Proceedings of the 27th annual
>> international symposium on Microarchitecture. IEEE, 1994.
>>
>>
>> <heuristics.txt><blocks.txt><edge.txt><call.txt><stprof-11.09.09.patch>_______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20090920/f460a313/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stprof-19.09.09.patch
Type: text/x-patch
Size: 99270 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20090920/f460a313/attachment.bin>