[LNT] r207898 - Use Mann-Whitney U test to identify changes

Wed May 7 08:12:34 PDT 2014

----- Original Message -----
> From: "Yi Kong" <Yi.Kong at arm.com>
> To: "Anton Korobeynikov" <anton at korobeynikov.info>, "Chris Matthews" <chris.matthews at apple.com>
> Cc: "Hal Finkel" <hfinkel at anl.gov>, "LLVM Commits" <llvm-commits at cs.uiuc.edu>, "Renato Golin"
> <renato.golin at linaro.org>
> Sent: Wednesday, May 7, 2014 10:02:11 AM
> Subject: Re: [LNT] r207898 - Use Mann-Whitney U test to identify changes
> 
> I've updated the patch to perform Mann-Whitney U test using
> significance
> table. We still need SciPy when sample size is large(> 20).

Why do you still require SciPy? That is a large package to pull in for one rarely-used function. Are there other parts of SciPy that you anticipate us using in the future? If not, then there are two options (which I think are both reasonable):

 1. Limit the number of repeat samples taken to 20 (running the test suite more than 20 times per revision seems unlikely in practice).
 2. Implement the normal approximation to the U-value calculation in the code. From the description on the wikipedia page, the algorithm seems pretty simple.

 -Hal

> 
> On Wed, 2014-05-07 at 09:12 +0100, Anton Korobeynikov wrote:
> > > If we are using significance table, we can no longer calculate p
> > > values,
> > > right? Is there any algorithm to calculate p value for all sample
> > > sizes?
> > We would just need to fix the threshold (say, 0.05 or 0.1 or 0.01).
> > Also, we can have like 2 or 3 tables for various thresholds. This
> > should be enough for all the practical purposes.
> > 
> 
> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory