[PATCH] [LNT] Use Mann-Whitney U test to identify changes

Fri May 2 02:16:33 PDT 2014

Hi Tobias,

On 02/05/14 08:38, Tobias Grosser wrote:
> On 01/05/2014 23:27, Yi Kong wrote:
>> This patch adds Mann-Whitney U tests to identify changes, as suggested by Tobias and Anton. User is able to configure the desired confidence level.
>
> Hi Yi Kong,
>
> thanks for this nice patch. I looked into it briefly by setting up an
> LNT server and adding a couple of the -O3 nightly test results to it. It
> seems at least with the default 0.5 confidence level this does not
> reduce the noise at all. Just switching aggregative function from
> minimum to mean helps here a lot more (any idea why?).

Median is far less affected by variance than minimum. Minimum may even
be an outlier.

> Did you play with
> the confidence level and got an idea which level would be useful?
> My very brief experiments showed that a value of 0.999 or even 0.9999 is
> something that gets us below the noise level. I verified this by looking
> at subsequent runs where the commits itself really just where
> documentation commits. Those commits should not show any noise. Even
> with those high confidence requirements, certain performance regressions
> such as r205965 can still be spotted. For me, this is already useful as
> we can really ask for extremely low noise answers,
> which will help to at least catch the very clear performance
> regressions. (Compared to today, where even those are hidden in the
> reporting noise)

I've been experimenting with the same dataset as yours. It seems 0.9
eliminates some noises, but not good enough. Although 0.999 produces
very nice results, but:
 >>> scipy.stats.mannwhitneyu([1,1,1,1,1],[2,2,2,2,1])
(2.5, 0.0099822266526080217)
That's only 0.99! Anything greater than 0.9 will cause too many false
negatives.

> I would like to play with this a little bit more. Do you think it is
> possible to print the p value in the Run-Over-Run Changes Details?

That shouldn't be too difficult. I could implement it if you want.
Alternatively you can just print them from the console which only takes
a one line change.

> Also, it may make sense to investigate this on another machine. I use 5
> identical but different machines. It may be interesting to see if runs
> on a same machine are more reliable and could get away with a lower
> confident interval. Did you do any experiments? Maybe with a higher run
> number 20?

For now I don't have other machine to test on, my desktop is far too
noisy. I've been trying to set up an ARM board.

> Cheers,
> Tobias
>

Regards,
Yi Kong

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium.  Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No:  2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No:  2548782