[LLVMdev] [LNT] Question about results reliability in LNT infrustructure
Tobias Grosser
tobias at grosser.es
Sun Jun 30 09:19:41 PDT 2013
On 06/30/2013 02:14 AM, Anton Korobeynikov wrote:
> Hi Tobi,
>
> First of all, all this is http://llvm.org/bugs/show_bug.cgi?id=1367 :)
>
>> The statistical test ministat is performing seems simple and pretty
>> standard. Is there any reason we could not do something similar? Or are we
>> doing it already and it just does not work as expected?
> The main problem with such sort of tests is that we cannot trust them, unless:
> 1. The data has the normal distribution
> 2. The sample size if large (say, > 50)
>
> Here we have only 3 points and, no, I won't trust the ministat's
> t-test and normal-approximation based confidence bounds. They are *too
> short* (=the real confidence level is no 99.5%, but, actually 40-50%,
> for example).
Hi Anton,
I trust your knowledge about statistics, but am wondering why ministat
(and it's t-test) is promoted as a statistical sane tool for
benchmarking results. Is the use of the t-test for benchmark results a
bad idea in general? Would ministat be a better tool if it implemented
the Wilcoxon/Mann-Whitney test?
> I'd ask for:
>
> 1. Increasing sample size to at least 5-10
> 2. Do the Wilcoxon/Mann-Whitney test
Reading about the Wilcoxon/Mann-Whitney, it seems to be a more robust
test that frees us from the normal-approximation assumption. As its
implementation also does not look overly complicated, it may be a good
choice.
Regarding the number of samples. I think the most important point is
that we get some measurement of confidence by which we can sort our
results and make it visible in the UI. For different use cases we can
adapt the number of samples based on the required confidence and the
amount of noise/lost regressions we can accept. This may also be a great
use for the adaptive sampling that Chris suggested.
Is there anything stopping us from implementing such a test and exposing
its results in the UI?
Cheers,
Tobi
More information about the llvm-dev
mailing list