<div dir="ltr"><div><div><div>Great summary Kristof !<br><br></div>I do not know how frequent is the addition of a new benchmark, but this would disrupt the compile time measurement. On the other hand, we just want to see a (hopefully negative) slope and ignore steps due to new benchmark being added.<br>

<br></div>Cheers,<br>--<br></div>Arnaud<br><div><div><br></div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Nov 13, 2013 at 2:14 PM, Kristof Beyls <span dir="ltr"><<a href="mailto:kristof.beyls@arm.com" target="_blank">kristof.beyls@arm.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div link="blue" vlink="purple" lang="EN-GB"><div><p class="MsoNormal">Hi,<u></u><u></u></p><p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">This is a summary of what was discussed at the Performance Tracking and<u></u><u></u></p><p class="MsoNormal">Benchmarking Infrastructure BoF session last week at the LLVM dev meeting.<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p><p class="MsoNormal">At the same time it contains a proposal on a few next steps to improve the<u></u><u></u></p><p class="MsoNormal">setup and use of buildbots to track performance changes in code generated by<u></u><u></u></p>

<p class="MsoNormal">LLVM.<u></u><u></u></p><p class="MsoNormal"><u></u> <u></u></p><p class="MsoNormal">The buildbots currently are very valuable in detecting correctness regressions,<u></u><u></u></p><p class="MsoNormal">

and getting the community to quickly rectify those regressions. However,<u></u><u></u></p><p class="MsoNormal">performance regressions are hardly noted and it seems as a community, we don't<u></u><u></u></p><p class="MsoNormal">

really keep track of those well.<u></u><u></u></p><p class="MsoNormal"><u></u> <u></u></p><p class="MsoNormal">The goal for the BoF was to try and find a number of actions that could take us<u></u><u></u></p><p class="MsoNormal">

closer to the point where as a community, we would at least notice some of the<u></u><u></u></p><p class="MsoNormal">performance regressions and take action to fix the regressions.  Given that<u></u><u></u></p><p class="MsoNormal">

this has been discussed already quite a few times at previous BoF sessions at<u></u><u></u></p><p class="MsoNormal">multiple developer meetings, we thought we should aim for a small, incremental,<u></u><u></u></p><p class="MsoNormal">

but sure improvement over the current status. Ideally, we should initially aim<u></u><u></u></p><p class="MsoNormal">for getting to the point where at least some of the performance regressions are<u></u><u></u></p><p class="MsoNormal">

detected and acted upon.<u></u><u></u></p><p class="MsoNormal"><u></u> <u></u></p><p class="MsoNormal">We already have a central database that stores benchmarking numbers, produced<u></u><u></u></p><p class="MsoNormal">for 2 boards, see<u></u><u></u></p>

<p class="MsoNormal"><a href="http://llvm.org/perf/db_default/v4/nts/recent_activity#machines" target="_blank">http://llvm.org/perf/db_default/v4/nts/recent_activity#machines</a>.  However, it<u></u><u></u></p><p class="MsoNormal">

seems no-one monitors the produced results, nor is it easy to derive from those<u></u><u></u></p><p class="MsoNormal">numbers if a particular patch really introduced a significant regression.<u></u><u></u></p><p class="MsoNormal">

<u></u> <u></u></p><p class="MsoNormal">At the BoF, we identified the following issues blocking us from being able to<u></u><u></u></p><p class="MsoNormal">detect significant regressions more easily:<u></u><u></u></p><p class="MsoNormal">

* A lot of the Execution Time and Compile Time results are very noisy, because<u></u><u></u></p><p class="MsoNormal">  the individual programs don't run long enough and don't take long enough to<u></u><u></u></p><p class="MsoNormal">

  compile (e.g. less than 0.1 seconds).<u></u><u></u></p><p class="MsoNormal">  * The proposed actions to improve the execution time is, for the programs<u></u><u></u></p><p class="MsoNormal">    under the Benchmarks sub-directories in the test-suite, to:<u></u><u></u></p>

<p class="MsoNormal">    a) increase the run time of the benchmark so it runs long enough to avoid<u></u><u></u></p><p class="MsoNormal">       noisy results. "Long enough" probably means roughly 10 seconds. We'd<u></u><u></u></p>

<p class="MsoNormal">       probably need a number of different settings, or a parameter that can<u></u><u></u></p><p class="MsoNormal">       be set per program, so that the running time on individual boards can<u></u><u></u></p>

<p class="MsoNormal">       be tuned. E.g. on a faster board, more iterations would be run than on<u></u><u></u></p><p class="MsoNormal">       a slower board.<u></u><u></u></p><p class="MsoNormal">    b) evaluate if the main running time of the benchmark is caused by running<u></u><u></u></p>

<p class="MsoNormal">       code compiled or by something else, e.g. file IO. Programs dominated by<u></u><u></u></p><p class="MsoNormal">       file IO shouldn't be used to track performance changes over time.<u></u><u></u></p>

<p class="MsoNormal">       The proposal to resolve this is to create a way to run the test suite in<u></u><u></u></p><p class="MsoNormal">       'benchmark mode', which includes only a subset of the test suite useful<u></u><u></u></p>

<p class="MsoNormal">       for benchmarking.  Hal Finkel volunteered to start this work.<u></u><u></u></p><p class="MsoNormal">  * The identified action to improve the compile time measurements is to just<u></u><u></u></p>

<p class="MsoNormal">    add up the compilation time for all benchmarks and measure that, instead<u></u><u></u></p><p class="MsoNormal">    of the compile times of the individual benchmarks.<u></u><u></u></p><p class="MsoNormal">

    It seems this could be implemented by simply changing or adding a view<u></u><u></u></p><p class="MsoNormal">    in the web interface, showing the trend of the compilation time for all<u></u><u></u></p><p class="MsoNormal">

    benchmarks over time, rather than trend graphs for individual programs.<u></u><u></u></p><p class="MsoNormal">  * Furthermore, on each individual board, the noise introduced by the board<u></u><u></u></p><p class="MsoNormal">

    itself should be minimized. Each board should have a maintainer, who ensures<u></u><u></u></p><p class="MsoNormal">    the board doesn't produce a significant level of noise.<u></u><u></u></p><p class="MsoNormal">

    If the board starts producing a high level of noise, and the maintainer<u></u><u></u></p><p class="MsoNormal">    doesn't fix it quickly, the performance numbers coming from the board will<u></u><u></u></p><p class="MsoNormal">

    be ignored. It's not clear what the best way would be to mark a board as<u></u><u></u></p><p class="MsoNormal">    being ignored.<u></u><u></u></p><p class="MsoNormal">    The suggestion was made that board maintainers could get a script to run<u></u><u></u></p>

<p class="MsoNormal">    before each benchmarking run, to check whether the board seems in a<u></u><u></u></p><p class="MsoNormal">    reasonable state, e.g. by checking the load on the board is near zero; "dd"<u></u><u></u></p>

<p class="MsoNormal">    executes as fast as expected; .... It's expected that the checks in the<u></u><u></u></p><p class="MsoNormal">    script might be somewhat dependent on the operating system the board<u></u><u></u></p>

<p class="MsoNormal">    runs.<u></u><u></u></p><p class="MsoNormal">  * To reduce the noise levels further, it would be nice if the execution time<u></u><u></u></p><p class="MsoNormal">    of individual benchmarks could be averaged out over a number (e.g. 5)<u></u><u></u></p>

<p class="MsoNormal">    consecutive runs. That way, each individual benchmark run remains<u></u><u></u></p><p class="MsoNormal">    relatively fast, by having to run each program just once; while at the same<u></u><u></u></p>

<p class="MsoNormal">    time the averaging should reduce some of the insignificant noise in the<u></u><u></u></p><p class="MsoNormal">    individual runs.<u></u><u></u></p><p class="MsoNormal"><u></u> <u></u></p><p class="MsoNormal">

I'd appreciate any feedback on the above proposals. We're also looking for more<u></u><u></u></p><p class="MsoNormal">volunteers to implement the above improvements; so if you're interested in<u></u><u></u></p>

<p class="MsoNormal">working on some of the above, please let us know.<u></u><u></u></p><p class="MsoNormal"><u></u> <u></u></p><p class="MsoNormal">Thanks,<span class="HOEnZb"><font color="#888888"><u></u><u></u></font></span></p>

<span class="HOEnZb"><font color="#888888"><p class="MsoNormal"><u></u> <u></u></p><p class="MsoNormal">Kristof<u></u><u></u></p><p class="MsoNormal">    <u></u><u></u></p></font></span></div></div><br>_______________________________________________<br>


LLVM Developers mailing list<br>

<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

<br></blockquote></div><br></div>