Hello everyone,<br>  Since this patch hasn't been applied yet, I'm taking the liberty to send a more updated version. I made several changes in the branch predictor (BranchPredictionPass) and in the intra procedural static profiler (BlockEdgeFrequencyPass).<br>

  Attachment: stprof-11.09.09.patch. <br><br>  While I was debbuging the code, I've noticed that I was maintaining data across multiple calls of runOnFunction on both passes, which was indeed an undesirable behavior. After cleaning up, both passes shown improved results. In fact, the branch predictor produced very close related prediction as Ball's predictor. I ran the branch predictor on the SPEC 2000 (int and float). I'm attaching "heuristics.txt" which compared results separated by heuristic for both predictors. While most heuristics predicted more accurately, the call and opcode heuristics shown worst results. This means that is still space for improvements. Also, I've found a bug in the way the predictor threats a branch that have some backedge sucessors, but not all. I expected that the other branches were always exit edges, but there are cases that this situation is not true.<br>

<br>  Moreover, while calculating blocks and edge frequencies it is possible to verify if it is calculating correct frequency information. Since the entry block frequency is always one, is expected that the exit's total frequency is also one. So, all we need to do is check the sum of all predecessors edges of exit's basic blocks to match one. However, I've found two cases of miscalculation by the pass: (1) when the control flow graph is not reducible; (2) when seems to be a loop that does not terminates (has no exit blocks). Although this pass can calculate frequencies for those situations, it might not be accurate. Nevertheless, seems like the pass is doing what is supposed to.<br>

<br>  The global static profiler (inter procedural) is not as accurate as Wu's paper yet. But after those fixes, it has improved quite significantly. I'm attaching the results of the profiler comparing the correct prediction rate of the top most executed blocks, edges, and function call invocations (ranging from 10% to 50%). Attachments: block.txt, edge.txt and call.txt.<br>

<br>  Thanks,<br>    Andrei<br><br><div class="gmail_quote">On Wed, Sep 2, 2009 at 3:37 PM, Andrei Alvares <span dir="ltr"><<a href="mailto:logytech@gmail.com">logytech@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

Hello everyone,<br>

  Here it follows the static profiler implementation, developed as<br>

part of my google summer of code project. It performs branch<br>

predictions in compilation time, i.e., assign probabilities to branch<br>

outcomes using a set of predefined heuristics. Also, it calculates<br>

intra and interprocedural profiling by staticly estimate basic blocks<br>

and edges frequencies (local and global) and function call invocations<br>

frequencies.<br>

  Attachment: stprof-02.02.09.patch<br>

<br>

  I've run the static profiler on some of the SPECint 2000 programs<br>

(those that I was able to compile and run).  It has not yet achieved<br>

the accuracy found in Wu's paper, but I believe it can still be<br>

improved.<br>

  Best regards,<br>

    Andrei<br>

<br>

Youfeng Wu and James R. Larus. Static branch frequency and program<br>

profile analysis. In MICRO 27: Proceedings of the 27th annual<br>

international symposium on Microarchitecture. IEEE, 1994.<br>

</blockquote></div><br>