[LNT] r207898 - Use Mann-Whitney U test to identify changes

Wed May 7 09:16:25 PDT 2014

----- Original Message -----
> From: "Yi Kong" <Yi.Kong at arm.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "LLVM Commits" <llvm-commits at cs.uiuc.edu>, "Renato Golin" <renato.golin at linaro.org>, "Anton Korobeynikov"
> <anton at korobeynikov.info>, "Chris Matthews" <chris.matthews at apple.com>
> Sent: Wednesday, May 7, 2014 11:06:36 AM
> Subject: Re: [LNT] r207898 - Use Mann-Whitney U test to identify changes
> 
> Updated. If the sample size if greater than 20, Mann-Whitney U test
> won't be performed.

Sorry, I was not clear...

+        # Use Mann-Whitney U test to test null hypothesis that result is

+        # unchanged.

+        if len(self.samples) >= 4 and len(self.samples) <= 20 and\

+        len(self.prev_samples) >= 4 and len(self.prev_samples) <= 20:

+            same = stats.mannwhitneyu(self.samples, self.prev_samples, self.confidence_lv)

+            if same:

+                return UNCHANGED_PASS

This is going to be very confusing; in theory, this means that increasing the number of samples will give you better results until you hit 20, and then suddenly you'll get meaningless answers. I had meant that it should produce an *error* if you even try to run such a configuration.

 -Hal

> 
> On Wed, 2014-05-07 at 16:12 +0100, Hal Finkel wrote:
> > ----- Original Message -----
> > > From: "Yi Kong" <Yi.Kong at arm.com>
> > > To: "Anton Korobeynikov" <anton at korobeynikov.info>, "Chris
> > > Matthews" <chris.matthews at apple.com>
> > > Cc: "Hal Finkel" <hfinkel at anl.gov>, "LLVM Commits"
> > > <llvm-commits at cs.uiuc.edu>, "Renato Golin"
> > > <renato.golin at linaro.org>
> > > Sent: Wednesday, May 7, 2014 10:02:11 AM
> > > Subject: Re: [LNT] r207898 - Use Mann-Whitney U test to identify
> > > changes
> > > 
> > > I've updated the patch to perform Mann-Whitney U test using
> > > significance
> > > table. We still need SciPy when sample size is large(> 20).
> > 
> > Why do you still require SciPy? That is a large package to pull in
> > for one rarely-used function. Are there other parts of SciPy that
> > you anticipate us using in the future? If not, then there are two
> > options (which I think are both reasonable):
> > 
> >  1. Limit the number of repeat samples taken to 20 (running the
> >  test suite more than 20 times per revision seems unlikely in
> >  practice).
> >  2. Implement the normal approximation to the U-value calculation
> >  in the code. From the description on the wikipedia page, the
> >  algorithm seems pretty simple.
> > 
> >  -Hal
> > 
> > > 
> > > On Wed, 2014-05-07 at 09:12 +0100, Anton Korobeynikov wrote:
> > > > > If we are using significance table, we can no longer
> > > > > calculate p
> > > > > values,
> > > > > right? Is there any algorithm to calculate p value for all
> > > > > sample
> > > > > sizes?
> > > > We would just need to fix the threshold (say, 0.05 or 0.1 or
> > > > 0.01).
> > > > Also, we can have like 2 or 3 tables for various thresholds.
> > > > This
> > > > should be enough for all the practical purposes.
> > > > 
> > > 
> > > 
> > > 
> > 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory