[www] r176209 - Add LNT statistics project

Fri Mar 1 06:31:05 PST 2013

Hi Renato,

It's an area where there's not an obvious "really right" answer. Other than
that it will increase your run-time discarding good values isn't a problem;
it's keeping bad values.  And a vague justification for a fixed percentage
(say 20%) discarded is that arguably, on a machine you've made quite enough
to consider doing benchmarking, anything that happens more than 20% of the
time -- even if it's as much a "platform" issue as a "direct code
generation" issue -- is an intrinsic part of the code performance. My real
reason for preferring using a median in the case of a one-sided noise
distribution (noise can make you slower, it can't make you faster) is that
it has the maximum possible breakdown point
(http://en.wikipedia.org/wiki/Robust_statistics#Breakdown_point) whereas the
mean has the lowest possible. So it's perfectly possible for the std dev
around the average to fall below a threshold more slowly than a
"trimmed-std-dev around median" based estimator, and even then be arguably
off.

But to be honest I'm sure someone must have figured out the pros and cons of
various stats for benchmarking, I just can't find anything written about the
simple cases. (Haskell has an interesting package called Criterion that uses
the bootstrap to detect when estimated variance is being heavily skewed by
outliers, but that's so conservative it recommends more tests than are
probably worth the cycles in terms of the confidence they give you in the
result.)

If anyone does know of a good source, please let me know!

Cheers,

Dave

From: Renato Golin [mailto:renato.golin at linaro.org] 
Sent: 01 March 2013 12:15
To: David Tweed
Cc: David Blaikie; LLVM Commits
Subject: Re: [www] r176209 - Add LNT statistics project

On 1 March 2013 11:47, David Tweed <david.tweed at arm.com> wrote:

The trick, of course, is knowing when it's reasonable to discard a big value

Hi Dave,

It depends on the data distribution. I don't think there is a definite rule
that will apply to all cases.

What I've done in the past was to get mean+stdev, discard anything after N
stdev (eg. 4, then 3 if none) and re-calculate mean+stdev.

Fix percentage has little meaning if you don't know the distribution. You
could be discarding good values, which means you'll have to re-run the test
more times than needed just to get a good stdev. 

And calculating ave+stdev twice is normally quicker than re-running a test,
so you can do it after every test until your stdev is at an acceptable
fraction of your mean (or you've hit the maximum number of runs).

cheers,

--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130301/56fba5c9/attachment.html>