[LLVMdev] Why is the default LNT aggregation function min instead of mean
tobias at grosser.es
Thu Jan 16 17:32:41 PST 2014
On 01/17/2014 02:17 AM, David Blaikie wrote:
> Right - you usually won't see a normal distribution in the noise of test
> results. You'll see results clustered around the lower bound with a long
> tail of slower and slower results. Depending on how many samples you do it
> might be appropriate to take the mean of the best 3, for example - but the
> general approach of taking the fastest N does have some basis in any case.
> Not necessarily the right answer, the only right answer, etc.
Interesting. In fact I had the very same thoughts at the beginning.
However, when looking at my test results the common pattern looks like
The run-time of a test case is very consistently one of several fixed
values. The distribution of the different times is very consistent and
seems to form, in fact, something like a normal distribution (more in
the center, less at the border).
The explanation I have here is that the machine is by itself in fact not
very noisy. Instead, changes of the execution context (e.g. due to
allocation of memory at a different location) influences the
performance. If we, by luck, have a run where all 'choices' have been
optimal we get minimal performance. However, in case of several
independent factors, it is more likely that we get a non-optimal
configuration that yields a value in the middle. Consequently, the
minimal seems to be a non-optimal choice here.
I understand that there may be some 'real' noise values, but as the
median does not seem to be affected very much by 'extremal' values, I
have the feeling it should be reasonable robust to such noise.
Have you seen examples where the median value gives a wrong impression
> On Thu, Jan 16, 2014 at 5:05 PM, Chris Matthews <chris.matthews at apple.com>wrote:
>> I think the idea with min is that it would the the ideal fastest run. The
>> other runs were ‘slowed' by system noise or something else.
>> On Jan 16, 2014, at 5:03 PM, Tobias Grosser <tobias at grosser.es> wrote:
>>> I am currently investigating how to ensure that LNT only shows relevant
>> performance regressions for the -O3 performance tests I am running.
>>> One question that came up here is why the default aggregate function for
>> LNT is 'min' instead of 'mean'. This looks a little surprising from the
>> statistical point, but also from looking at my test results picking 'min'
>> seems to be an inferior choice.
>>> For all test runs I have looked at, picking mean largely reduces the
>> run-over-run changes reported due to noise.
>>> See this run e.g:
>>> If we use the median, we just get just one change reported:
>>> If you use min, we get eight reports one claiming over 100% performance
>>> reduction for a case that is really just pure noise. I am planning to
>> look into using better statistical methods. However, as a start, could we
>> switch the default to 'mean'?
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
More information about the llvm-dev