<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii"><meta name=Generator content="Microsoft Word 12 (filtered medium)"><style><!--

/* Font Definitions */

@font-face

        {font-family:"Cambria Math";

        panose-1:2 4 5 3 5 4 6 3 2 4;}

@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}

/* Style Definitions */

p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0cm;

        margin-bottom:.0001pt;

        font-size:11.0pt;

        font-family:"Calibri","sans-serif";}

a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:blue;

        text-decoration:underline;}

a:visited, span.MsoHyperlinkFollowed

        {mso-style-priority:99;

        color:purple;

        text-decoration:underline;}

span.EmailStyle17

        {mso-style-type:personal-compose;

        font-family:"Calibri","sans-serif";

        color:windowtext;}

.MsoChpDefault

        {mso-style-type:export-only;}

@page WordSection1

        {size:612.0pt 792.0pt;

        margin:72.0pt 72.0pt 72.0pt 72.0pt;}

div.WordSection1

        {page:WordSection1;}

--></style><!--[if gte mso 9]><xml>

<o:shapedefaults v:ext="edit" spidmax="1026" />

</xml><![endif]--><!--[if gte mso 9]><xml>

<o:shapelayout v:ext="edit">

<o:idmap v:ext="edit" data="1" />

</o:shapelayout></xml><![endif]--></head><body lang=EN-GB link=blue vlink=purple><div class=WordSection1><p class=MsoNormal>Hi,<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>This is a summary of what was discussed at the Performance Tracking and<o:p></o:p></p><p class=MsoNormal>Benchmarking Infrastructure BoF session last week at the LLVM dev meeting.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>At the same time it contains a proposal on a few next steps to improve the<o:p></o:p></p><p class=MsoNormal>setup and use of buildbots to track performance changes in code generated by<o:p></o:p></p><p class=MsoNormal>LLVM.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>The buildbots currently are very valuable in detecting correctness regressions,<o:p></o:p></p><p class=MsoNormal>and getting the community to quickly rectify those regressions. However,<o:p></o:p></p><p class=MsoNormal>performance regressions are hardly noted and it seems as a community, we don't<o:p></o:p></p><p class=MsoNormal>really keep track of those well.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>The goal for the BoF was to try and find a number of actions that could take us<o:p></o:p></p><p class=MsoNormal>closer to the point where as a community, we would at least notice some of the<o:p></o:p></p><p class=MsoNormal>performance regressions and take action to fix the regressions.  Given that<o:p></o:p></p><p class=MsoNormal>this has been discussed already quite a few times at previous BoF sessions at<o:p></o:p></p><p class=MsoNormal>multiple developer meetings, we thought we should aim for a small, incremental,<o:p></o:p></p><p class=MsoNormal>but sure improvement over the current status. Ideally, we should initially aim<o:p></o:p></p><p class=MsoNormal>for getting to the point where at least some of the performance regressions are<o:p></o:p></p><p class=MsoNormal>detected and acted upon.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>We already have a central database that stores benchmarking numbers, produced<o:p></o:p></p><p class=MsoNormal>for 2 boards, see<o:p></o:p></p><p class=MsoNormal>http://llvm.org/perf/db_default/v4/nts/recent_activity#machines.  However, it<o:p></o:p></p><p class=MsoNormal>seems no-one monitors the produced results, nor is it easy to derive from those<o:p></o:p></p><p class=MsoNormal>numbers if a particular patch really introduced a significant regression.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>At the BoF, we identified the following issues blocking us from being able to<o:p></o:p></p><p class=MsoNormal>detect significant regressions more easily:<o:p></o:p></p><p class=MsoNormal>* A lot of the Execution Time and Compile Time results are very noisy, because<o:p></o:p></p><p class=MsoNormal>  the individual programs don't run long enough and don't take long enough to<o:p></o:p></p><p class=MsoNormal>  compile (e.g. less than 0.1 seconds).<o:p></o:p></p><p class=MsoNormal>  * The proposed actions to improve the execution time is, for the programs<o:p></o:p></p><p class=MsoNormal>    under the Benchmarks sub-directories in the test-suite, to:<o:p></o:p></p><p class=MsoNormal>    a) increase the run time of the benchmark so it runs long enough to avoid<o:p></o:p></p><p class=MsoNormal>       noisy results. "Long enough" probably means roughly 10 seconds. We'd<o:p></o:p></p><p class=MsoNormal>       probably need a number of different settings, or a parameter that can<o:p></o:p></p><p class=MsoNormal>       be set per program, so that the running time on individual boards can<o:p></o:p></p><p class=MsoNormal>       be tuned. E.g. on a faster board, more iterations would be run than on<o:p></o:p></p><p class=MsoNormal>       a slower board.<o:p></o:p></p><p class=MsoNormal>    b) evaluate if the main running time of the benchmark is caused by running<o:p></o:p></p><p class=MsoNormal>       code compiled or by something else, e.g. file IO. Programs dominated by<o:p></o:p></p><p class=MsoNormal>       file IO shouldn't be used to track performance changes over time.<o:p></o:p></p><p class=MsoNormal>       The proposal to resolve this is to create a way to run the test suite in<o:p></o:p></p><p class=MsoNormal>       'benchmark mode', which includes only a subset of the test suite useful<o:p></o:p></p><p class=MsoNormal>       for benchmarking.  Hal Finkel volunteered to start this work.<o:p></o:p></p><p class=MsoNormal>  * The identified action to improve the compile time measurements is to just<o:p></o:p></p><p class=MsoNormal>    add up the compilation time for all benchmarks and measure that, instead<o:p></o:p></p><p class=MsoNormal>    of the compile times of the individual benchmarks.<o:p></o:p></p><p class=MsoNormal>    It seems this could be implemented by simply changing or adding a view<o:p></o:p></p><p class=MsoNormal>    in the web interface, showing the trend of the compilation time for all<o:p></o:p></p><p class=MsoNormal>    benchmarks over time, rather than trend graphs for individual programs.<o:p></o:p></p><p class=MsoNormal>  * Furthermore, on each individual board, the noise introduced by the board<o:p></o:p></p><p class=MsoNormal>    itself should be minimized. Each board should have a maintainer, who ensures<o:p></o:p></p><p class=MsoNormal>    the board doesn't produce a significant level of noise.<o:p></o:p></p><p class=MsoNormal>    If the board starts producing a high level of noise, and the maintainer<o:p></o:p></p><p class=MsoNormal>    doesn't fix it quickly, the performance numbers coming from the board will<o:p></o:p></p><p class=MsoNormal>    be ignored. It's not clear what the best way would be to mark a board as<o:p></o:p></p><p class=MsoNormal>    being ignored.<o:p></o:p></p><p class=MsoNormal>    The suggestion was made that board maintainers could get a script to run<o:p></o:p></p><p class=MsoNormal>    before each benchmarking run, to check whether the board seems in a<o:p></o:p></p><p class=MsoNormal>    reasonable state, e.g. by checking the load on the board is near zero; "dd"<o:p></o:p></p><p class=MsoNormal>    executes as fast as expected; .... It's expected that the checks in the<o:p></o:p></p><p class=MsoNormal>    script might be somewhat dependent on the operating system the board<o:p></o:p></p><p class=MsoNormal>    runs.<o:p></o:p></p><p class=MsoNormal>  * To reduce the noise levels further, it would be nice if the execution time<o:p></o:p></p><p class=MsoNormal>    of individual benchmarks could be averaged out over a number (e.g. 5)<o:p></o:p></p><p class=MsoNormal>    consecutive runs. That way, each individual benchmark run remains<o:p></o:p></p><p class=MsoNormal>    relatively fast, by having to run each program just once; while at the same<o:p></o:p></p><p class=MsoNormal>    time the averaging should reduce some of the insignificant noise in the<o:p></o:p></p><p class=MsoNormal>    individual runs.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>I'd appreciate any feedback on the above proposals. We're also looking for more<o:p></o:p></p><p class=MsoNormal>volunteers to implement the above improvements; so if you're interested in<o:p></o:p></p><p class=MsoNormal>working on some of the above, please let us know.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Thanks,<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Kristof<o:p></o:p></p><p class=MsoNormal>    <o:p></o:p></p></div></body></html>