[LLVMdev] [LNT] Question about results reliability in LNT infrustructure
David.Tweed at arm.com
Mon Jul 1 02:35:00 PDT 2013
Just some general observations:
Firstly, just to note that when I talk about looking at what statisticians have developed I’m not being snobbish, it’s that pretty much any methodology will show up big effects it’s getting the best “power” on small effects when you’ve got marginal sample sizes that’s tricky and where a lot of people have already spent a long time thinking about these things.
On Jun 30, 2013 8:12 PM, "Anton Korobeynikov" <anton at korobeynikov.info<mailto:anton at korobeynikov.info>> wrote:
> > Getting 10 samples at different commits will give you similar accuracy if
> > behaviour doesn't change, and you can rely on 10-point blocks before and > after each change to have the same result.
> Right. But this way you will have 10-commits delay. So, you will need
> 3-4 additional test runs to pinpoint the offending commit in the worst
> > This is why I proposed something like moving averages.
> Moving average will "smooth" the result. So, only really big changes
> will be caught by it.
Just to state the obvious, statistics is best able to detect small effects the fewer extraneous things you try to estimate precisely. So I don’t quite see why an appropriately robust change-point estimator isn’t what we’d like to use here. (Someone earlier in the thread suggested it wasn’t, but I didn’t follow why.) In such a case you can use the 2-3 results from several consecutive commits in the “before” region and 2-3 results from several consecutive results in the after region, which seems a reasonable fit for the experimental situation. (My objection to smoothing is just that it’s summarising data before using a statistical test for no good reason, not that tracking samples over a window seems problematic.)
| Like any result in statistics, the result should be quoted together with a +/- figure derived from the statistical method used. Generally, low sample size means high +/-.
“Yes, but...” ☺ That’s absolutely true, but even +/- figures can be overly optimistic/overly pessimistic depending how well the actual distributions in practice match the assumptions about the distributions implicit in the statistical test. (As you can probably tell, I’m heavily Bayesian and regard statistics as ways of coherently assigning numbers to your beliefs and assumptions, along with new data, so making assumptions – that are going to be re-examined as things progress -- is fine; objective, assumption-free statistics doesn’t really exist for me.)
-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev