<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=us-ascii"><meta name=Generator content="Microsoft Word 12 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-GB link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Hi Renato,<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>It's an area where there's not an obvious "really right" answer. Other than that it will increase your run-time discarding good values isn't a problem; it's keeping bad values. And a vague justification for a fixed percentage (say 20%) discarded is that arguably, on a machine you've made quite enough to consider doing benchmarking, anything that happens more than 20% of the time -- even if it's as much a "platform" issue as a "direct code generation" issue -- is an intrinsic part of the code performance. My real reason for preferring using a median in the case of a one-sided noise distribution (noise can make you slower, it can't make you faster) is that it has the maximum possible breakdown point (http://en.wikipedia.org/wiki/Robust_statistics#Breakdown_point) whereas the mean has the lowest possible. So it's perfectly possible for the std dev around the average to fall below a threshold more slowly than a "trimmed-std-dev around median" based estimator, and even then be arguably off.<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>But to be honest I'm sure someone must have figured out the pros and cons of various stats for benchmarking, I just can't find anything written about the simple cases. (Haskell has an interesting package called Criterion that uses the bootstrap to detect when estimated variance is being heavily skewed by outliers, but that's so conservative it recommends more tests than are probably worth the cycles in terms of the confidence they give you in the result.)<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>If anyone does know of a good source, please let me know!<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Cheers,<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Dave<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm'><p class=MsoNormal><b><span lang=EN-US style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span lang=EN-US style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> Renato Golin [mailto:renato.golin@linaro.org] <br><b>Sent:</b> 01 March 2013 12:15<br><b>To:</b> David Tweed<br><b>Cc:</b> David Blaikie; LLVM Commits<br><b>Subject:</b> Re: [www] r176209 - Add LNT statistics project<o:p></o:p></span></p></div><p class=MsoNormal><o:p> </o:p></p><div><p class=MsoNormal>On 1 March 2013 11:47, David Tweed <<a href="mailto:david.tweed@arm.com" target="_blank">david.tweed@arm.com</a>> wrote:<o:p></o:p></p><div><div><blockquote style='border:none;border-left:solid #CCCCCC 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-right:0cm'><p class=MsoNormal>The trick, of course, is knowing when it's reasonable to discard a big value<o:p></o:p></p></blockquote><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>Hi Dave,<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>It depends on the data distribution. I don't think there is a definite rule that will apply to all cases.<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>What I've done in the past was to get mean+stdev, discard anything after N stdev (eg. 4, then 3 if none) and re-calculate mean+stdev.<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>Fix percentage has little meaning if you don't know the distribution. You could be discarding good values, which means you'll have to re-run the test more times than needed just to get a good stdev. <o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>And calculating ave+stdev twice is normally quicker than re-running a test, so you can do it after every test until your stdev is at an acceptable fraction of your mean (or you've hit the maximum number of runs).<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>cheers,<o:p></o:p></p></div><div><p class=MsoNormal>--renato<o:p></o:p></p></div></div></div></div></div></body></html>