[LLVMdev] New -O3 Performance tester - Use hardware to get reliable numbers

Tue Jan 7 17:48:23 PST 2014

On Tue, Jan 7, 2014 at 11:06 AM, Tobias Grosser <tobias at grosser.es> wrote:

> Hi,
>
> I would like to announce a new set of LNT -O3 performance testers.
>
> In a discussion titled "Question about results reliability in LNT
> infrustructure" Anton suggested that one way to get statistically reliable
> test results from the LNT infrastructure is to use a larger sample size
> (5-10) as well as a more robust statistical test (Wilcoxon/Mann-Whitney).
> Another requirement to make the performance results we get from our testers
> useful is to have a per-commit performance run.
>
> I would like to announce that I set up 4 identical machines* that publicly
> report LNT results for 'clang -O3' at:
>
> http://llvm.org/perf/db_default/v4/nts/machine/34
>
> We currently catch in average groups of 3-5 commits. As most commits
> obviously do not impact performance this seems to be enough to track down
> performance regressions/changes easily.
>

If possible, I think it would be a good idea to filter out commits that
don't affect code generation. This would allow machine resources to be
better used.

Is there some way we can easily filter commits based on whether they affect
code generation or not? Would it be reliable enough to check if the commit
touches any of our integration tests?

As a rough estimate:

sean:~/pg/llvm/llvm % git log --oneline --since='1 month ago' | wc -l
706
sean:~/pg/llvm/llvm % git log --oneline --since='1 month ago' ./test | wc -l
317

So it seems like if this is reasonable we can effectively double our
performance testing coverage by filtering like this.

-- Sean Silva

>
> The results that have been reported so far seem to provide sufficient
> information to catch performance changes. Specifically, when setting the
> aggregation function to median, most runs are shown to not impact
> performance:
>
> e.g: http://llvm.org/perf/db_default/v4/nts/19939?num_
> comparison_runs=10&test_filter=&test_min_value_filter=
> &aggregation_fn=median&compare_to=19934&submit=Update
>
> We still have a couple of runs that report performance differences, but
> where looking at the performance graph of the changed test cases makes it
> very clear that those are false positives due to test case noise.
>
> Here comes the point of this mail. I am currently not sure when I find
> time to improve the LNT infrastructure to take advantage of the data
> provided. So in case someone else would like to have a look and e.g. add
> the Wilcoxon/Mann-Whitney test this would be highly appreciated.
>
> I also have a couple of more machines. Hence, if the LNT infrastructure is
> in place we can use them to increase the reliability of the results even
> more.
>
> Cheers,
> Tobias
>
> * Also have sufficiently close performance characteristics when running
> LNT tests for the same version
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140107/909e1cb6/attachment.html>