<div dir="ltr">On 28 February 2013 17:05, David Blaikie <span dir="ltr"><<a href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im"><span style="color:rgb(34,34,34)">To be clear the intention is not to rewrite LNT - but the test-suite</span><br>

</div>

beneath it. It's a complex hodge-podge of shell, Make, C, awk, etc...<br>

difficult to maintain/add new features to. LNT was built with the<br>

intention that the test-suite execution could be rewritten beneath it.<br></blockquote><div><br></div><div style>Oh, that. Well, yes, it's a bit hacky, but I haven't delved deep enough to know much.</div><div><br>

</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="im"><span style="color:rgb(34,34,34)">Hardly my forte, though I don't immediately see why changing the</span><br></div>

number of cycles would make regression analysis invalid.<br></blockquote><div><br></div><div style>Because I can only know if a benchmark is regressing if it's static and the run-time changes or if there is a clear output per time unit.</div>

<div style><br></div><div style>But output per time unit (like Linpack or Dhrystone) are measure of raw output, not compiler performance. They're good to compare two different architectures, but not so good to spot regressions between revisions. All of them require some sort of fine tuning and heuristics to determine start and stop steps.</div>

<div style><br></div><div style>For instance, Linpack keeps trying bigger matrices until one run takes more than 10s, which means that from run to run you can have, say 4 or 5 cycles. It also automatically selects the initial run's size based on some heuristics, so if something changes in the platform (space available, memory, etc), the heuristics could change the initial run, and you wouldn't be able to compare any run after that with the runs before the change.</div>

<div><br></div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="im"><span style="color:rgb(34,34,34)">If the OS is differently loaded each time you run the tests you're</span><br></div>

going to have a hard time doing regression analysis anyway, aren't<br>

you?<br></blockquote><div><br></div><div style>Yes, but what I was saying was still related to the initial run / number of runs. If your last usual run is, say 2048 bytes matrices that usually take 10.05s and one day it takes 9.95, you'll end up with another run. Depending on the heuristics (Livermore Loops had a particularly troubling one), you could change the results completely from run to run.</div>

<div style><br></div><div style>cheers,</div><div style>--renato</div></div></div></div>