Hi Dan,<br>  Thanks for your valuable comments. Here it follows what I've done and not. Sorry for taking so long to reply, I was testing your recommendations.<br>  Attachment: stprof-19.09.09.patch<br><br><div class="gmail_quote">


On Mon, Sep 14, 2009 at 12:54 PM, Dan Gohman <span dir="ltr"><<a href="mailto:gohman@apple.com" target="_blank">gohman@apple.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Hi Andrei,<br>

<br>

Here are some review comments for this patch.  I think it would be<br>

fine to check into the tree at this point, so that future changes can<br>

be reviewed incrementally instead of with monolithic patches.<br>

<br></blockquote><div><br>I agree. It would be great.<br> <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


BranchPredictionPass has<br>

    double getEdgeProbability(Edge &edge) const;<br>

as a public member function, but Edge is a private typedef, so one of<br>

these has the wrong visibility.<br>

<br></blockquote><div><br>Fixed. The typedef for edge have a public visibility for BranchPredictionPass and BlockEdgeFrequencyPass now.<br> <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


The passes should override the Pass::releaseMemory virtual function.<br>

Some currently have a Clear member function; they should implement<br>

releaseMemory and have it call Clear, or perhaps Clear should be<br>

renamed releaseMemory, depending on how it's used.<br>

<br></blockquote><div><br>Done. But I would like to know more about this functionality. Consider the BranchPreditionPass class. I need to maintain edge predictions for subsequent passes (like BlockEdgeFrequencyPass that will consume it). If the pass manager calls the BranchPredictionPass::releaseMemory, should I free those edge predictions too? The attached patch is freeing it when releaseMemory is called.<br>


 <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


lib/Analysis/BranchPredictionPass.cpp has<br>

#include "llvm/Transforms/Utils/BasicBlockUtils.h"<br>

which is a red flag because lib/Analysis code is not allowed to<br>

depend on lib/Transforms code. However, this #include appears to be<br>

unneeded, so it should be removed.<br>

<br></blockquote><div><br>I tried llvm::FindFunctionBackedges (BasicBlockUtils), but since I was already building the DominatorTree I decided to take advantage of it and calculate back edges using domination. Sorry for leaving this unused include file.<br>


<br></div>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

BranchHeuristicsInfo::MatchCallHeuristic and some of the others<br>

don't test whether the terminator is a BranchInst. It looks like<br>

there's nothing preventing them from analyzing a block with an<br>

InvokeInst as its terminator. Is that intended by the heuristics?<br></blockquote><div><br>All the functions that process heuristics (Match{Some Heuristic}Heuristic)

have a private visibility. The MatchHeuristic is a wrapper for those

functions. There is a comment in the MatchHeuristic header claiming that it expects a BasicBlock with exactly two successors. The heuristics are only capable to handle blocks with two successors.<br><br>I haven't add a check inside MatchHeuristic because I'm already ensuring this condition before it is called (on BranchPredictionPass). I was trying to avoid redundant checks. Do you believe that it should have the test anyway (inside BranchHeuristic)?<br>


 </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

BranchHeuristicsInfo::postDominates calls<br>

PostDominatorTree::properlyDominates. Since the distinction between<br>

domination and proper domination is often both subtle and critical<br>

for algorithm correctness, please either rename this function or add<br>

comments or both.<br>

<br></blockquote><div><br>The algorithm requires post domination rather than properly post dominates. Take the call heuristic for example. If a block contains a call and a loop to itself, the heuristic should match. Properly post domination miss this and other cases.<br>


<br>DominatorTree have "dominates" and "properlyDominates" functions. How come PostDominatorTree implemented only properlyDominates? That's way I felt the need to created my own.<br><br></div>


<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


BranchPredictionInfo::FindBackEdges does a DominatorTree query for<br>

every edge in the function. It would be possible to get a close<br>

approximation of this using LoopInfo. That would miss unnatural<br>

loops and spaghetti code though; is that the reason for doing the<br>

full traversal? Please add a comment about this.<br>

<br></blockquote><div><br>That's an interesting question. At first I thought that it would be possible, but after I've implemented the back edges considering only LoopInfo, the profiler shown a very low hit rate (20% worst in average). I've maintained the back edge calculation with domination tree.<br>


 <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


BranchPredictionInfo::FindExitEdges seems to be redundant with<br>

Loop::getExitEdges.<br>

<br></blockquote><div><br>Thanks for the tip. Unfortunately, I've tried to use the LoopInfo::getExitEdge in the branch prediction pass, but it is crashing. I can't tell for sure the root cause of the problem, but I believe that this function required that the loops are in simplified form, which is not what happens in some situations.<br>


<br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Could BranchPredictionInfo::FindStores and FindCalls be combined<br>

to avoid doing so many function traversals?<br></blockquote><div><br>I thought about this at first, but those functions have different semantics. Put them together would require the treatment of a lot of "if"s cases, thus making the code less elegant and without the proper benefits. If you insist I can merge them.<br>


 <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<br>

In its current form, BranchPredictionPass won't be usable in LLVM's<br>

default optimization pipeline because it requires post-dominator tree<br>

construction. None of LLVM's current default optimization passes<br>

require post-dominator trees, so constructing it just for this pass<br>

will likely be too slow. It could possibly be justified for -O3,<br>

if there are sufficient benefits to oughtweigh the costs. There are<br>

many LLVM users that don't use the standard -O2 or -O3 passes<br>

also, of course.<br>

<br></blockquote><div><br>Unfortunately, four of the nine implemented heuristics require post dominate tree information. The profiler would be very imprecise if those heuristics were lacking. Maybe when LLVM have more profile-guided passes, the benefits will out weight its costs.<br>


<br>  Thanks,<br>    Andrei<br> <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

Dan<div><div></div><div><br>

<br>

On Sep 11, 2009, at 6:46 PM, Andrei Alvares wrote:<br>

<br>

</div></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div><div></div><div>

Hello everyone,<br>

  Since this patch hasn't been applied yet, I'm taking the liberty to send a more updated version. I made several changes in the branch predictor (BranchPredictionPass) and in the intra procedural static profiler (BlockEdgeFrequencyPass).<br>


  Attachment: stprof-11.09.09.patch.<br>

<br>

  While I was debbuging the code, I've noticed that I was maintaining data across multiple calls of runOnFunction on both passes, which was indeed an undesirable behavior. After cleaning up, both passes shown improved results. In fact, the branch predictor produced very close related prediction as Ball's predictor. I ran the branch predictor on the SPEC 2000 (int and float). I'm attaching "heuristics.txt" which compared results separated by heuristic for both predictors. While most heuristics predicted more accurately, the call and opcode heuristics shown worst results. This means that is still space for improvements. Also, I've found a bug in the way the predictor threats a branch that have some backedge sucessors, but not all. I expected that the other branches were always exit edges, but there are cases that this situation is not true.<br>


<br>

  Moreover, while calculating blocks and edge frequencies it is possible to verify if it is calculating correct frequency information. Since the entry block frequency is always one, is expected that the exit's total frequency is also one. So, all we need to do is check the sum of all predecessors edges of exit's basic blocks to match one. However, I've found two cases of miscalculation by the pass: (1) when the control flow graph is not reducible; (2) when seems to be a loop that does not terminates (has no exit blocks). Although this pass can calculate frequencies for those situations, it might not be accurate. Nevertheless, seems like the pass is doing what is supposed to.<br>


<br>

  The global static profiler (inter procedural) is not as accurate as Wu's paper yet. But after those fixes, it has improved quite significantly. I'm attaching the results of the profiler comparing the correct prediction rate of the top most executed blocks, edges, and function call invocations (ranging from 10% to 50%). Attachments: block.txt, edge.txt and call.txt.<br>


<br>

  Thanks,<br>

    Andrei<br>

<br>

On Wed, Sep 2, 2009 at 3:37 PM, Andrei Alvares <<a href="mailto:logytech@gmail.com" target="_blank">logytech@gmail.com</a>> wrote:<br>

Hello everyone,<br>

 Here it follows the static profiler implementation, developed as<br>

part of my google summer of code project. It performs branch<br>

predictions in compilation time, i.e., assign probabilities to branch<br>

outcomes using a set of predefined heuristics. Also, it calculates<br>

intra and interprocedural profiling by staticly estimate basic blocks<br>

and edges frequencies (local and global) and function call invocations<br>

frequencies.<br>

 Attachment: stprof-02.02.09.patch<br>

<br>

 I've run the static profiler on some of the SPECint 2000 programs<br>

(those that I was able to compile and run).  It has not yet achieved<br>

the accuracy found in Wu's paper, but I believe it can still be<br>

improved.<br>

 Best regards,<br>

   Andrei<br>

<br>

Youfeng Wu and James R. Larus. Static branch frequency and program<br>

profile analysis. In MICRO 27: Proceedings of the 27th annual<br>

international symposium on Microarchitecture. IEEE, 1994.<br>

<br></div></div>

<heuristics.txt><blocks.txt><edge.txt><call.txt><stprof-11.09.09.patch>_______________________________________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

</blockquote>

<br>

</blockquote></div><br>