[LLVMdev] New -O3 Performance tester - Use hardware to get reliable numbers

Tue Jan 7 10:06:51 PST 2014

Hi,

I would like to announce a new set of LNT -O3 performance testers.

In a discussion titled "Question about results reliability in LNT 
infrustructure" Anton suggested that one way to get statistically 
reliable test results from the LNT infrastructure is to use a larger 
sample size (5-10) as well as a more robust statistical test 
(Wilcoxon/Mann-Whitney). Another requirement to make the performance 
results we get from our testers useful is to have a per-commit 
performance run.

I would like to announce that I set up 4 identical machines* that 
publicly report LNT results for 'clang -O3' at:

http://llvm.org/perf/db_default/v4/nts/machine/34

We currently catch in average groups of 3-5 commits. As most commits 
obviously do not impact performance this seems to be enough to track 
down performance regressions/changes easily.

The results that have been reported so far seem to provide sufficient 
information to catch performance changes. Specifically, when setting the 
aggregation function to median, most runs are shown to not impact 
performance:

e.g: 
http://llvm.org/perf/db_default/v4/nts/19939?num_comparison_runs=10&test_filter=&test_min_value_filter=&aggregation_fn=median&compare_to=19934&submit=Update

We still have a couple of runs that report performance differences, but 
where looking at the performance graph of the changed test cases makes 
it very clear that those are false positives due to test case noise.

Here comes the point of this mail. I am currently not sure when I find 
time to improve the LNT infrastructure to take advantage of the data 
provided. So in case someone else would like to have a look and e.g. add 
the Wilcoxon/Mann-Whitney test this would be highly appreciated.

I also have a couple of more machines. Hence, if the LNT infrastructure 
is in place we can use them to increase the reliability of the results 
even more.

Cheers,
Tobias

* Also have sufficiently close performance characteristics when running 
LNT tests for the same version