[LLVMdev] Why is the default LNT aggregation function min instead of mean

Thu Jan 16 17:32:41 PST 2014

On 01/17/2014 02:17 AM, David Blaikie wrote:
> Right - you usually won't see a normal distribution in the noise of test
> results. You'll see results clustered around the lower bound with a long
> tail of slower and slower results. Depending on how many samples you do it
> might be appropriate to take the mean of the best 3, for example - but the
> general approach of taking the fastest N does have some basis in any case.
>
> Not necessarily the right answer, the only right answer, etc.

Interesting. In fact I had the very same thoughts at the beginning.

However, when looking at my test results the common pattern looks like 
this example:

http://llvm.org/perf/db_default/v4/nts/graph?show_all_points=yes&moving_window_size=10&plot.0=34.95.3&submit=Update

The run-time of a test case is very consistently one of several fixed 
values. The distribution of the different times is very consistent and 
seems to form, in fact, something like a normal distribution (more in 
the center, less at the border).

The explanation I have here is that the machine is by itself in fact not 
very noisy. Instead, changes of the execution context (e.g. due to 
allocation of memory at a different location) influences the 
performance. If we, by luck, have a run where all 'choices' have been 
optimal we get minimal performance. However, in case of several 
independent factors, it is more likely that we get a non-optimal 
configuration that yields a value in the middle. Consequently, the 
minimal seems to be a non-optimal choice here.

I understand that there may be some 'real' noise values, but as the 
median does not seem to be affected very much by 'extremal' values, I 
have the feeling it should be reasonable robust to such noise.

Have you seen examples where the median value gives a wrong impression
regarding performance?

Cheers,
Tobias

>
>
> On Thu, Jan 16, 2014 at 5:05 PM, Chris Matthews <chris.matthews at apple.com>wrote:
>
>> I think the idea with min is that it would the the ideal fastest run.  The
>> other runs were ‘slowed' by system noise or something else.
>>
>>
>> On Jan 16, 2014, at 5:03 PM, Tobias Grosser <tobias at grosser.es> wrote:
>>
>>> Hi,
>>>
>>> I am currently investigating how to ensure that LNT only shows relevant
>> performance regressions for the -O3 performance tests I am running.
>>>
>>> One question that came up here is why the default aggregate function for
>> LNT is 'min' instead of 'mean'. This looks a little surprising from the
>> statistical point, but also from looking at my test results picking 'min'
>> seems to be an inferior choice.
>>>
>>> For all test runs I have looked at, picking mean largely reduces the
>> run-over-run changes reported due to noise.
>>>
>>> See this run e.g:
>>>
>>> If we use the median, we just get just one change reported:
>>>
>>>
>> http://llvm.org/perf/db_default/v4/nts/20661?num_comparison_runs=10&test_filter=&test_min_value_filter=&aggregation_fn=median&compare_to=20659&submit=Update
>>>
>>> If you use min, we get eight reports one claiming over 100% performance
>>> reduction for a case that is really just pure noise. I am planning to
>> look into using better statistical methods. However, as a start, could we
>> switch the default to 'mean'?
>>>
>>> Cheers,
>>> Tobias
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>