<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii"><meta name=Generator content="Microsoft Word 12 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri","sans-serif";
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-GB link=blue vlink=purple><div class=WordSection1><p class=MsoNormal>Hi,<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>This is a summary of what was discussed at the Performance Tracking and<o:p></o:p></p><p class=MsoNormal>Benchmarking Infrastructure BoF session last week at the LLVM dev meeting.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>At the same time it contains a proposal on a few next steps to improve the<o:p></o:p></p><p class=MsoNormal>setup and use of buildbots to track performance changes in code generated by<o:p></o:p></p><p class=MsoNormal>LLVM.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>The buildbots currently are very valuable in detecting correctness regressions,<o:p></o:p></p><p class=MsoNormal>and getting the community to quickly rectify those regressions. However,<o:p></o:p></p><p class=MsoNormal>performance regressions are hardly noted and it seems as a community, we don't<o:p></o:p></p><p class=MsoNormal>really keep track of those well.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>The goal for the BoF was to try and find a number of actions that could take us<o:p></o:p></p><p class=MsoNormal>closer to the point where as a community, we would at least notice some of the<o:p></o:p></p><p class=MsoNormal>performance regressions and take action to fix the regressions. Given that<o:p></o:p></p><p class=MsoNormal>this has been discussed already quite a few times at previous BoF sessions at<o:p></o:p></p><p class=MsoNormal>multiple developer meetings, we thought we should aim for a small, incremental,<o:p></o:p></p><p class=MsoNormal>but sure improvement over the current status. Ideally, we should initially aim<o:p></o:p></p><p class=MsoNormal>for getting to the point where at least some of the performance regressions are<o:p></o:p></p><p class=MsoNormal>detected and acted upon.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>We already have a central database that stores benchmarking numbers, produced<o:p></o:p></p><p class=MsoNormal>for 2 boards, see<o:p></o:p></p><p class=MsoNormal>http://llvm.org/perf/db_default/v4/nts/recent_activity#machines. However, it<o:p></o:p></p><p class=MsoNormal>seems no-one monitors the produced results, nor is it easy to derive from those<o:p></o:p></p><p class=MsoNormal>numbers if a particular patch really introduced a significant regression.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>At the BoF, we identified the following issues blocking us from being able to<o:p></o:p></p><p class=MsoNormal>detect significant regressions more easily:<o:p></o:p></p><p class=MsoNormal>* A lot of the Execution Time and Compile Time results are very noisy, because<o:p></o:p></p><p class=MsoNormal> the individual programs don't run long enough and don't take long enough to<o:p></o:p></p><p class=MsoNormal> compile (e.g. less than 0.1 seconds).<o:p></o:p></p><p class=MsoNormal> * The proposed actions to improve the execution time is, for the programs<o:p></o:p></p><p class=MsoNormal> under the Benchmarks sub-directories in the test-suite, to:<o:p></o:p></p><p class=MsoNormal> a) increase the run time of the benchmark so it runs long enough to avoid<o:p></o:p></p><p class=MsoNormal> noisy results. "Long enough" probably means roughly 10 seconds. We'd<o:p></o:p></p><p class=MsoNormal> probably need a number of different settings, or a parameter that can<o:p></o:p></p><p class=MsoNormal> be set per program, so that the running time on individual boards can<o:p></o:p></p><p class=MsoNormal> be tuned. E.g. on a faster board, more iterations would be run than on<o:p></o:p></p><p class=MsoNormal> a slower board.<o:p></o:p></p><p class=MsoNormal> b) evaluate if the main running time of the benchmark is caused by running<o:p></o:p></p><p class=MsoNormal> code compiled or by something else, e.g. file IO. Programs dominated by<o:p></o:p></p><p class=MsoNormal> file IO shouldn't be used to track performance changes over time.<o:p></o:p></p><p class=MsoNormal> The proposal to resolve this is to create a way to run the test suite in<o:p></o:p></p><p class=MsoNormal> 'benchmark mode', which includes only a subset of the test suite useful<o:p></o:p></p><p class=MsoNormal> for benchmarking. Hal Finkel volunteered to start this work.<o:p></o:p></p><p class=MsoNormal> * The identified action to improve the compile time measurements is to just<o:p></o:p></p><p class=MsoNormal> add up the compilation time for all benchmarks and measure that, instead<o:p></o:p></p><p class=MsoNormal> of the compile times of the individual benchmarks.<o:p></o:p></p><p class=MsoNormal> It seems this could be implemented by simply changing or adding a view<o:p></o:p></p><p class=MsoNormal> in the web interface, showing the trend of the compilation time for all<o:p></o:p></p><p class=MsoNormal> benchmarks over time, rather than trend graphs for individual programs.<o:p></o:p></p><p class=MsoNormal> * Furthermore, on each individual board, the noise introduced by the board<o:p></o:p></p><p class=MsoNormal> itself should be minimized. Each board should have a maintainer, who ensures<o:p></o:p></p><p class=MsoNormal> the board doesn't produce a significant level of noise.<o:p></o:p></p><p class=MsoNormal> If the board starts producing a high level of noise, and the maintainer<o:p></o:p></p><p class=MsoNormal> doesn't fix it quickly, the performance numbers coming from the board will<o:p></o:p></p><p class=MsoNormal> be ignored. It's not clear what the best way would be to mark a board as<o:p></o:p></p><p class=MsoNormal> being ignored.<o:p></o:p></p><p class=MsoNormal> The suggestion was made that board maintainers could get a script to run<o:p></o:p></p><p class=MsoNormal> before each benchmarking run, to check whether the board seems in a<o:p></o:p></p><p class=MsoNormal> reasonable state, e.g. by checking the load on the board is near zero; "dd"<o:p></o:p></p><p class=MsoNormal> executes as fast as expected; .... It's expected that the checks in the<o:p></o:p></p><p class=MsoNormal> script might be somewhat dependent on the operating system the board<o:p></o:p></p><p class=MsoNormal> runs.<o:p></o:p></p><p class=MsoNormal> * To reduce the noise levels further, it would be nice if the execution time<o:p></o:p></p><p class=MsoNormal> of individual benchmarks could be averaged out over a number (e.g. 5)<o:p></o:p></p><p class=MsoNormal> consecutive runs. That way, each individual benchmark run remains<o:p></o:p></p><p class=MsoNormal> relatively fast, by having to run each program just once; while at the same<o:p></o:p></p><p class=MsoNormal> time the averaging should reduce some of the insignificant noise in the<o:p></o:p></p><p class=MsoNormal> individual runs.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>I'd appreciate any feedback on the above proposals. We're also looking for more<o:p></o:p></p><p class=MsoNormal>volunteers to implement the above improvements; so if you're interested in<o:p></o:p></p><p class=MsoNormal>working on some of the above, please let us know.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Thanks,<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Kristof<o:p></o:p></p><p class=MsoNormal> <o:p></o:p></p></div></body></html>