Hi folks, I am going to submit a project to Google Summer of Code, the Project is below to anyone it might interest. Any critics and suggestions are welcome.<br>If anyone is interested mentoring this project please contact me.<br>


<br><br><br>Building and Executing Traces on LLVM<br><br>Daniel Nicacio<br>IC-UNICAMP, Brazil<br>April 5, 2011<br><br>1 Objective<br>The objective of this project is to augment LLVM with dynamic proling ca-<br>pabilities. Interpreted code will be instrumented so that LLVM will be able to<br>


dynamically nd, build and execute traces of more frequently executed basic<br>blocks. This way, future projects can focus on implementing dynamic optimiza-<br>tions on those traces.<br><br>2 Motivation<br>Just In Time (JIT) compilers and Dynamic Binary Translators (DBT) can pro-<br>


vide information collected at runtime. The static compiler does not have this<br>kind of information, so using it, new optimization opportunities comes to sur-<br>face. In order to have a e cient optimization system, it is essential to know<br>


where to apply optimizations, that is, know how to obtain critic regions of code<br>which really sets out gain possibilities. A sequence of instructions, including<br>branches but not including loops, that is executed for some input data is called<br>


a trace [5]. Dynamic systems can select traces which are heavily executed as<br>good spots to make optimizations. Optimizing traces is one promising way to<br>overcome the overhead of translating instructions to the new ISA, permitting<br>


the new code to run faster than the original one.<br>In the last years, dynamic optimizations has become an interesting subject<br>due to its many advantages, being the research topic of some works [1, 3, 4, 8].<br>All these works must rely on good trace detection techniques.<br>


At the present moment LLVM does not build and execute traces using run-<br>time information. We believe that a light-weight dynamic proling will be a<br>valuable tool to improve LLVM performance.<br><br>3 Methodology<br>


In this project we will implement the Most Recent Executed Tail (MRET) [2]<br>technique to nd and build traces. On the rst stage we will nd basic blocks<br>that are targets of back edges (potential loop headers) and then instrument those<br>


basic blocks to count how many times it is executed. Therefore, if it reaches<br>a threshold a function call (previously added to the basic block) is executed to<br>start recording the nal trace. Finally the recording process ends when it goes<br>


back to the beginning of the trace or reaches another trace.<br>Every time program execution reaches an address that corresponds to the<br>beggining of a trace it jumps to the trace instead of the original basic block.<br>


Traces are stored together in memory; this fact alone already brings benets to<br>the system, because the program execution will likely remains inside this region<br>of memory, taking advantage of the locality principle when fetching instructions.<br>


<br>4 Timeline<br>This project is expected to last three months, divided as follows:<br>first month - instrument basic blocks to count how many times it is<br>executed and add a function call to trace recording.<br>second month - Record traces and change original code so it can jump<br>


to traces instead of the original basic blocks.<br>third month - Measure performance of llvm when executing traces and<br>compare it to the original implementation. Make some tuning if necessary.<br><br>5 Biography<br>I am a PhD student at the University of Campinas (UNICAMP), Brazil. My<br>


advisor is Guido Araujo, who has major experience in compilers and computer<br>architecture. During my research I worked with dynamic binary translators,<br>developing trace detection techniques and dynamic optimizations to be applied<br>


on those traces. In this project I found out that building traces does not add<br>signicant overhead and that code reallocation brings performance gains to the<br>system.<br>I also worked with a dynamic binary translator that used a direct mapping<br>


technique to run PowerPC code on X86 architectures, this project also granted<br>me experience on dynamic environments and code instrumentation.<br>More recently, I have been working with Software Transactional Memory<br>


(STM), building a light-weight user-level transaction scheduler. We achieved<br>great results in this project, pushing the boundaries of STM performance. Ba-<br>sicaly, we implemented a heuristic to predict transaction con<br>


icts, if we nd<br>that a transaction is likely going to abort, we switch contexts at user-level to<br>start executing another transaction. One major advantage of our system is that<br>we have complete control of which transactions are currently running and which<br>


transaction is going to replace a doomed one.<br>My research has yielded two papers already published [6, 7] and I also have<br>two papers under review at the present moment.<br>I hope this project gets accepted because I believe that it will be useful for<br>


the LLVM community. It would be a wonderful experience to work alongside<br>LLVM developers and to learn some practical aspects of coding for a robust<br>compiler such as LLVM.<br><br>References<br>[1] V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: A transparent Dynamic<br>


Optimization System. SIGPLAN PLDI, pages 1{12, June 2000.<br>[2] Vasanth Bala, Evelyn Duesterwald, and Sanjeev Banerjia. Dynamo: a trans-<br>parent dynamic optimization system. SIGPLAN Not., 35(5):1{12, 2000.<br>[3] Kemal Ebcio glu and Erik R. Altman. Daisy: dynamic compilation for 100<br>


In ISCA '97: Proceedings of the 24th annual international symposium on<br>Computer architecture, pages 26{37, New York, NY, USA, 1997. ACM.<br>[4] J. Lu, H. Chen, P.-C. Yew, , and W.-C. Hsu. Design and implementation of a<br>


lightweight dynamic optimization system. The Journal of Instruction-Level<br>Parallelism, 6, 2004.<br>[5] Steven S. Muchnick. Advanced Compiler Design and Implementation. Mor-<br>gan Kaufmann Publishers, 340 Pine Street , Sixth Floor, San Francisco ,<br>


CA 94104-3205 , USA, 1997.<br>[6] Daniel Niccio and Guido Arajo. Reducing false aborts in stm systems. In<br>Ching-Hsien Hsu, Laurence Yang, Jong Park, and Sang-Soo Yeo, editors,<br>Algorithms and Architectures for Parallel Processing, volume 6081 of Lecture<br>


Notes in Computer Science, pages 499{510. Springer Berlin / Heidelberg,<br>2010.<br>[7] Maxwell Souza, Daniel Nic acio, and Guido Ara ujo. ISAMAP: Instruction<br>Mapping Driven by Dynamic Binary Translation. In Mauricio Breternitz,<br>


Robert Cohn, Erik Altman, and Youfeng Wu, editors, AMAS-BT - 3rd<br>Workshop on Architectural and Microarchitectural Support for Binary Trans-<br>lation, Saint Malo France, 2010.<br>[8] D. Ung and C. Cifuentes. Optimising hot paths in a dynamic binary trans-<br>


lator. In Workshop on Binary Translation, October 2000.<br>