[LLVMdev] Proposal: Improvements to Performance Tracking Infrastructure.

Wed Nov 13 05:42:41 PST 2013

Great summary Kristof !

I do not know how frequent is the addition of a new benchmark, but this
would disrupt the compile time measurement. On the other hand, we just want
to see a (hopefully negative) slope and ignore steps due to new benchmark
being added.

Cheers,
--
Arnaud

On Wed, Nov 13, 2013 at 2:14 PM, Kristof Beyls <kristof.beyls at arm.com>wrote:

> Hi,
>
>
>
> This is a summary of what was discussed at the Performance Tracking and
>
> Benchmarking Infrastructure BoF session last week at the LLVM dev meeting.
>
>
>
> At the same time it contains a proposal on a few next steps to improve the
>
> setup and use of buildbots to track performance changes in code generated
> by
>
> LLVM.
>
>
>
> The buildbots currently are very valuable in detecting correctness
> regressions,
>
> and getting the community to quickly rectify those regressions. However,
>
> performance regressions are hardly noted and it seems as a community, we
> don't
>
> really keep track of those well.
>
>
>
> The goal for the BoF was to try and find a number of actions that could
> take us
>
> closer to the point where as a community, we would at least notice some of
> the
>
> performance regressions and take action to fix the regressions.  Given that
>
> this has been discussed already quite a few times at previous BoF sessions
> at
>
> multiple developer meetings, we thought we should aim for a small,
> incremental,
>
> but sure improvement over the current status. Ideally, we should initially
> aim
>
> for getting to the point where at least some of the performance
> regressions are
>
> detected and acted upon.
>
>
>
> We already have a central database that stores benchmarking numbers,
> produced
>
> for 2 boards, see
>
> http://llvm.org/perf/db_default/v4/nts/recent_activity#machines.
> However, it
>
> seems no-one monitors the produced results, nor is it easy to derive from
> those
>
> numbers if a particular patch really introduced a significant regression.
>
>
>
> At the BoF, we identified the following issues blocking us from being able
> to
>
> detect significant regressions more easily:
>
> * A lot of the Execution Time and Compile Time results are very noisy,
> because
>
>   the individual programs don't run long enough and don't take long enough
> to
>
>   compile (e.g. less than 0.1 seconds).
>
>   * The proposed actions to improve the execution time is, for the programs
>
>     under the Benchmarks sub-directories in the test-suite, to:
>
>     a) increase the run time of the benchmark so it runs long enough to
> avoid
>
>        noisy results. "Long enough" probably means roughly 10 seconds. We'd
>
>        probably need a number of different settings, or a parameter that
> can
>
>        be set per program, so that the running time on individual boards
> can
>
>        be tuned. E.g. on a faster board, more iterations would be run than
> on
>
>        a slower board.
>
>     b) evaluate if the main running time of the benchmark is caused by
> running
>
>        code compiled or by something else, e.g. file IO. Programs
> dominated by
>
>        file IO shouldn't be used to track performance changes over time.
>
>        The proposal to resolve this is to create a way to run the test
> suite in
>
>        'benchmark mode', which includes only a subset of the test suite
> useful
>
>        for benchmarking.  Hal Finkel volunteered to start this work.
>
>   * The identified action to improve the compile time measurements is to
> just
>
>     add up the compilation time for all benchmarks and measure that,
> instead
>
>     of the compile times of the individual benchmarks.
>
>     It seems this could be implemented by simply changing or adding a view
>
>     in the web interface, showing the trend of the compilation time for all
>
>     benchmarks over time, rather than trend graphs for individual programs.
>
>   * Furthermore, on each individual board, the noise introduced by the
> board
>
>     itself should be minimized. Each board should have a maintainer, who
> ensures
>
>     the board doesn't produce a significant level of noise.
>
>     If the board starts producing a high level of noise, and the maintainer
>
>     doesn't fix it quickly, the performance numbers coming from the board
> will
>
>     be ignored. It's not clear what the best way would be to mark a board
> as
>
>     being ignored.
>
>     The suggestion was made that board maintainers could get a script to
> run
>
>     before each benchmarking run, to check whether the board seems in a
>
>     reasonable state, e.g. by checking the load on the board is near zero;
> "dd"
>
>     executes as fast as expected; .... It's expected that the checks in the
>
>     script might be somewhat dependent on the operating system the board
>
>     runs.
>
>   * To reduce the noise levels further, it would be nice if the execution
> time
>
>     of individual benchmarks could be averaged out over a number (e.g. 5)
>
>     consecutive runs. That way, each individual benchmark run remains
>
>     relatively fast, by having to run each program just once; while at the
> same
>
>     time the averaging should reduce some of the insignificant noise in the
>
>     individual runs.
>
>
>
> I'd appreciate any feedback on the above proposals. We're also looking for
> more
>
> volunteers to implement the above improvements; so if you're interested in
>
> working on some of the above, please let us know.
>
>
>
> Thanks,
>
>
>
> Kristof
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131113/1f3fb2f1/attachment.html>