<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=utf-8"><meta name=Generator content="Microsoft Word 14 (filtered medium)"><!--[if !mso]><style>v\:* {behavior:url(#default#VML);}

o\:* {behavior:url(#default#VML);}

w\:* {behavior:url(#default#VML);}

.shape {behavior:url(#default#VML);}

</style><![endif]--><style><!--

/* Font Definitions */

@font-face

        {font-family:Tahoma;

        panose-1:2 11 6 4 3 5 4 4 2 4;}

/* Style Definitions */

p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0cm;

        margin-bottom:.0001pt;

        font-size:11.0pt;

        font-family:"Calibri","sans-serif";

        mso-fareast-language:EN-US;}

a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:blue;

        text-decoration:underline;}

a:visited, span.MsoHyperlinkFollowed

        {mso-style-priority:99;

        color:purple;

        text-decoration:underline;}

p.MsoPlainText, li.MsoPlainText, div.MsoPlainText

        {mso-style-priority:99;

        mso-style-link:"Plain Text Char";

        margin:0cm;

        margin-bottom:.0001pt;

        font-size:11.0pt;

        font-family:"Courier New";

        mso-fareast-language:EN-US;}

pre

        {mso-style-priority:99;

        mso-style-link:"HTML Preformatted Char";

        margin:0cm;

        margin-bottom:.0001pt;

        font-size:10.0pt;

        font-family:"Courier New";}

p.MsoAcetate, li.MsoAcetate, div.MsoAcetate

        {mso-style-priority:99;

        mso-style-link:"Balloon Text Char";

        margin:0cm;

        margin-bottom:.0001pt;

        font-size:8.0pt;

        font-family:"Tahoma","sans-serif";

        mso-fareast-language:EN-US;}

span.PlainTextChar

        {mso-style-name:"Plain Text Char";

        mso-style-priority:99;

        mso-style-link:"Plain Text";

        font-family:"Courier New";}

span.BalloonTextChar

        {mso-style-name:"Balloon Text Char";

        mso-style-priority:99;

        mso-style-link:"Balloon Text";

        font-family:"Tahoma","sans-serif";}

span.HTMLPreformattedChar

        {mso-style-name:"HTML Preformatted Char";

        mso-style-priority:99;

        mso-style-link:"HTML Preformatted";

        font-family:"Courier New";

        mso-fareast-language:EN-GB;}

.MsoChpDefault

        {mso-style-type:export-only;

        font-family:"Calibri","sans-serif";

        mso-fareast-language:EN-US;}

@page WordSection1

        {size:612.0pt 792.0pt;

        margin:72.0pt 61.75pt 72.0pt 61.75pt;}

div.WordSection1

        {page:WordSection1;}

--></style><!--[if gte mso 9]><xml>

<o:shapedefaults v:ext="edit" spidmax="1026" />

</xml><![endif]--><!--[if gte mso 9]><xml>

<o:shapelayout v:ext="edit">

<o:idmap v:ext="edit" data="1" />

</o:shapelayout></xml><![endif]--></head><body lang=EN-GB link=blue vlink=purple><div class=WordSection1><p class=MsoPlainText>Thanks for raising this, Chris!<o:p></o:p></p><p class=MsoPlainText><o:p> </o:p></p><p class=MsoPlainText>I also think that improving the signal-to-noise ratio in the performance<br>reports produced by LNT are essential to make the performance-tracking<br>bots useful and effective.<o:p></o:p></p><p class=MsoPlainText><o:p> </o:p></p><p class=MsoPlainText>Our experience, using LNT internally, has been that if the number of false<br>positives are low enough (lower than about half a dozen per report or day),<br>they become useable, leaving only a little bit of manual investigation work<br>to detect if a particular change was significant or in the noise. Yes, ideally<br>the automated noise detection should be perfect; but even if it's not perfect,<br>it will already be a massive win.<o:p></o:p></p><p class=MsoPlainText><o:p> </o:p></p><p class=MsoPlainText>I have some further ideas and remarks below.<o:p></o:p></p><p class=MsoPlainText><o:p> </o:p></p><p class=MsoPlainText>Thanks,<o:p></o:p></p><p class=MsoPlainText><o:p> </o:p></p><p class=MsoPlainText>Kristof<o:p></o:p></p><p class=MsoPlainText><o:p> </o:p></p><p class=MsoPlainText><o:p> </o:p></p><p class=MsoPlainText>> <span lang=EN-US style='mso-fareast-language:EN-GB'>-----Original Message-----</span></p><p class=MsoPlainText>> <span lang=EN-US style='mso-fareast-language:EN-GB'>From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu]</span></p><p class=MsoPlainText>> <span lang=EN-US style='mso-fareast-language:EN-GB'>On Behalf Of Chris Matthews</span></p><p class=MsoPlainText>> <span lang=EN-US style='mso-fareast-language:EN-GB'>Sent: 15 May 2015 22:25</span></p><p class=MsoPlainText>> <span lang=EN-US style='mso-fareast-language:EN-GB'>To: LLVM Developers Mailing List</span></p><p class=MsoPlainText>> <span lang=EN-US style='mso-fareast-language:EN-GB'>Subject: [LLVMdev] Proposal: change LNT’s regression detection algorithm</span></p><p class=MsoPlainText>> <span lang=EN-US style='mso-fareast-language:EN-GB'>and how it is used to reduce false positives</span></p><p class=MsoPlainText>> </p><p class=MsoPlainText>> tl;dr in low data situations we don’t look at past information, and that</p><p class=MsoPlainText>> increases the false positive regression rate.  We should look at the</p><p class=MsoPlainText>> possibly incorrect recent past runs to fix that.</p><p class=MsoPlainText>> </p><p class=MsoPlainText>> Motivation: LNT’s current regression detection system has false positive</p><p class=MsoPlainText>> rate that is too high to make it useful.  With test suites as large as</p><p class=MsoPlainText>> the llvm “test-suite” a single report will show hundreds of regressions.</p><p class=MsoPlainText>> The false positive rate is so high the reports are ignored because it is</p><p class=MsoPlainText>> impossible for a human to triage them, large performance problems are</p><p class=MsoPlainText>> lost in the noise, small important regressions never even have a chance.</p><p class=MsoPlainText>> Later today I am going to commit a new unit test to LNT with 40 of my</p><p class=MsoPlainText>> favorite regression patterns.  It has gems such as flat but noisy line,</p><p class=MsoPlainText>> 5% regression in 5% noise, bimodal, and a slow increase, we fail to</p><p class=MsoPlainText>> classify most of these correctly right now. They are not trick</p><p class=MsoPlainText>> questions, all are obvious regressions or non-regressions, that are</p><p class=MsoPlainText>> plainly visible. I want us to correctly classify them all!</p><p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p><p class=MsoPlainText><span style='color:black'>That's a great idea!<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'>Out of all of the ideas in this email, I think this is the most important<br>one to implement first.<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p><p class=MsoPlainText>> Some context: LNTs regression detection algorithm as I understand it:</p><p class=MsoPlainText>> </p><p class=MsoPlainText>> detect(current run’s samples, last runs samples) —> improve, regress or</p><p class=MsoPlainText>> unchanged.</p><p class=MsoPlainText>> </p><p class=MsoPlainText>>     # when recovering from errors performance should not be counted</p><p class=MsoPlainText>>     Current or last run failed -> unchanged</p><p class=MsoPlainText>> </p><p class=MsoPlainText>>     delta = min(current samples) - min(prev samples)</p><p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p><p class=MsoPlainText><span style='color:black'>I am not convinced that "min" is the best way to define the delta.<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'>It makes the assumption that the "true" performance of code generated by llvm<br>is the fastest it was ever seen running. I think this isn't the correct way<br>to model e.g. programs with bimodal behaviour, nor programs with a normal<br>distribution. I'm afraid I don't have a better solution, but I think the<br>Mann Whitney U test - which tries to determine if the sample points seem<br>to indicate different underlying distributions - is closer to what we<br>really ought to use to detect if a regression is "real". This way, it models<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'>that a fixed program, when run multiple times, has a distribution of<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'>performance. I think that using "min" makes too many broken assumptions on<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'>what the distribution can look like.<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p><p class=MsoPlainText>> </p><p class=MsoPlainText>>     # too small to measure</p><p class=MsoPlainText>>     delta <  (confidence*machine noise threshold (0.0005s by default)) -</p><p class=MsoPlainText>> > unchanged</p><p class=MsoPlainText>> </p><p class=MsoPlainText>>     # too small to care</p><p class=MsoPlainText>>     delta % < 1% -> unchanged</p><p class=MsoPlainText>> </p><p class=MsoPlainText>>     # too small to care</p><p class=MsoPlainText>>     delta < 0.01s -> unchanged</p><p class=MsoPlainText>> </p><p class=MsoPlainText>>     if len(current samples) >= 4 && len(prev samples) >= 4</p><p class=MsoPlainText>>          Mann whitney U test -> possible unchanged</p><p class=MsoPlainText>> </p><p class=MsoPlainText>>     #multisample, confidence interval check</p><p class=MsoPlainText>>     if len(current samples) > 1</p><p class=MsoPlainText>>            check delta within samples confidence interval -> if so,</p><p class=MsoPlainText>> unchanged, else Improve, regress.</p><p class=MsoPlainText>> </p><p class=MsoPlainText>>     # single sample,range check</p><p class=MsoPlainText>>     if len(current samples) == 1</p><p class=MsoPlainText>>         all % deltas above 1% improve or regress</p><p class=MsoPlainText>> </p><p class=MsoPlainText>> </p><p class=MsoPlainText>> The too small to care rules are newer inventions.</p><p class=MsoPlainText>> </p><p class=MsoPlainText>> Effectiveness data: to see how well these rules work I ran a 14 machine,</p><p class=MsoPlainText>> 7 day report:</p><p class=MsoPlainText>> </p><p class=MsoPlainText>> - 16773 run comparisons</p><p class=MsoPlainText>> - 13852 marked unchanged because of small % delta</p><p class=MsoPlainText>> - 2603 unchanged because of small delta</p><p class=MsoPlainText>> - 0 unchanged because of Mann Whitney U test</p><p class=MsoPlainText>> - 0 unchanged because of confidence interval</p><p class=MsoPlainText>> - 318 improved or regressed because single sample change over 1%</p><p class=MsoPlainText>> </p><p class=MsoPlainText>> Real regressions: probably 1 or 2, not that I will click 318 links to</p><p class=MsoPlainText>> check for sure… hence the motivation.</p><p class=MsoPlainText>> </p><p class=MsoPlainText>> Observations: Most of the work is done by dropping small deltas.</p><p class=MsoPlainText>> Confidence intervals and Mann Whitney U tests are the tests we want to</p><p class=MsoPlainText>> be triggering, however they only work with many samples. Even with</p><p class=MsoPlainText>> reruns, most tests end up being a single sample.  LNT bots that a</p><p class=MsoPlainText>> triggered after another build (unless using the multisample feature)</p><p class=MsoPlainText>> just have one sample at each rev.  Multisample is not a good option</p><p class=MsoPlainText>> because most runs already take a long time.</p><p class=MsoPlainText>> </p><p class=MsoPlainText>> Even with a small amount of predictable noise, if len(current samples)</p><p class=MsoPlainText>> == 1, will flag a lot of samples, especially if len(prev) > 1.  Reruns</p><p class=MsoPlainText>> actually make this worse by making it likely that we flag the next run</p><p class=MsoPlainText>> after the run we rerun.  For instance, a flat line with 5% random noise</p><p class=MsoPlainText>> flags all the time.</p><p class=MsoPlainText>> </p><p class=MsoPlainText>> Besides the Mann Whitney U test, we are not using prev_samples in any</p><p class=MsoPlainText>> way sane way.</p><p class=MsoPlainText>> </p><p class=MsoPlainText>> Ideas:</p><p class=MsoPlainText>> </p><p class=MsoPlainText>> -Try and get more samples in as many places as possible.  Maybe —</p><p class=MsoPlainText>> multisample=4 should be the default?  Make bots run more often (I have</p><p class=MsoPlainText>> already done this on green dragon).</p><p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p><p class=MsoPlainText><span style='color:black'>FWIW, the Cortex-A53 performance tracker I've set up recently uses<br>multisample=3. The Cortex-A53 is a slower/more energy-efficient core,<br>so it takes about 6 hours to do a LLVM rebuild + 3 runs of the LNT<br>benchmarks (see <a href="http://llvm.org/perf/db_default/v4/nts/machine/39">http://llvm.org/perf/db_default/v4/nts/machine/39</a>).<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p><p class=MsoPlainText>> - Use recent past run information to enhance single sample regression</p><p class=MsoPlainText>> detection.  I think we should add a lookback window, and model the</p><p class=MsoPlainText>> recent past.  I tired a technique suggested by Mikhail Zolotukhin of</p><p class=MsoPlainText>> computing delta as the smallest difference between current and all the</p><p class=MsoPlainText>> previous samples.  It was far more effective.  Alternately we could try</p><p class=MsoPlainText>> a confidence interval generated from previous, though that may not work</p><p class=MsoPlainText>> on bimodal tests.</p><p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p><p class=MsoPlainText><span style='color:black'>The noise levels per individual program are often dependent on the<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'>micro-architecture of the core it runs on. Before setting up the Cortex-A53<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'>performance tracking bot, I've done a bit of analysis to find out what the noise<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'>levels are per program across a Cortex-A53, a Cortex-A57 and a Core i7 CPU. Below<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'>is an example of a chart for just one program, indicating that the noise level is<br>sometimes dependent on the micro-architecture of the core it runs on. Whereas a<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'>Mann-Withney U - or similar - test would probably find - given enough data<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'>points - what should be considered noise and what not; there may be a way to<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'>run the test-suite in benchmark mode many times when a board gets set up, and analyse<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'>the results of that. The idea is that this way, the noisiness of the board as setup<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'>for fixed binaries could be measured, and that information could be used when not<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'>enough sample points are available.<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'>(FWIW: for this program, the noisiness seems to come from noisiness in the number<br>of branch mispredicts).<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'>BTW – graphs like the one below make me think that the LNT webUI should be showing<br>sample points be default instead of line graphs showing the minimum execution time<br>per build number.<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p><p class=MsoPlainText><span style='color:black;mso-fareast-language:EN-GB'><img border=0 width=867 height=185 id="Picture_x0020_1" src="cid:image003.png@01D09184.199CC740"></span><span style='color:black'><o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p><p class=MsoPlainText>> - Currently prev_samples is almost always just one other run, probably</p><p class=MsoPlainText>> with only one sample itself.  Lets give this more samples to work with.</p><p class=MsoPlainText>> Start passing more previous run data to all uses of the algorithm, in</p><p class=MsoPlainText>> most places we intentionally limit the computation to current=run and</p><p class=MsoPlainText>> previous=run-1, lets do something like previous=run-[1-10]. The risk in</p><p class=MsoPlainText>> this approach is that regression noise in the look back window could</p><p class=MsoPlainText>> trigger a false negative (we miss detecting a regression).  I think this</p><p class=MsoPlainText>> is acceptable since we already miss lots of them because the reports are</p><p class=MsoPlainText>> not actionable.</p><p class=MsoPlainText>> </p><p class=MsoPlainText>> - Given the choice between false positive and false negative, lets err</p><p class=MsoPlainText>> towards false negative.  We need to have manageable number of</p><p class=MsoPlainText>> regressions detected or else we can’t act on them.</p><p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p><p class=MsoPlainText><span style='color:black'>This sounds like a good idea to me. Let's first make sure we have a working<br>system of (semi-?)automatically detecting at least a good portion of the<br>significant performance regression. After that we can fine tune to reduce<br>false negatives to catch a larger part of all significant performance<br>regressions.<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p><p class=MsoPlainText>> </p><p class=MsoPlainText>> Any objections to me implementing these ideas?</p><p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p><p class=MsoPlainText><span style='color:black'>Absolutely not. Once implemented, we probably ought to have an idea about how<br>to test which combination of methods works best in practice. Could the<br>sample points you’re going to add to the LNT unit tests help in testing which<br>combination of methods work best?<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p><p class=MsoPlainText><span style='color:black'>I've got 2 further ideas, based on observations from the data coming from the<br>Cortex-A53 performance tracker that I added about 10 days ago - see<br><a href="http://llvm.org/perf/db_default/v4/nts/machine/39">http://llvm.org/perf/db_default/v4/nts/machine/39</a>.<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'>I'll be posting patches for review for these soon:<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p><p class=MsoPlainText><span style='color:black'>1. About 20 of the 300-ish programs that get run in benchmark-only mode run<br>for less than 10 milliseconds. These 20 programs are one of the main sources<br>of noisiness. We should just not run these programs in benchmark-only mode.<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'>Or - alternatively we should make them run a bit longer, so that they are less<br>noisy.<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p><p class=MsoPlainText><span style='color:black'>2. The board I'm running the Cortex-A53 performance tracker on is a big.LITTLE<br>system with 2 Cortex-A57s and 4 Cortex-A53s. To build the benchmark binaries,<br>I'm using all cores, to make the turn-around time of the bot as fast as possible.<br>However, this leads to huge noise levels on the "compile_time" metric, as sometimes<br>a binary gets compiled on a Cortex-A53 and sometimes on a Cortex-A57. For this<br>board specifically, it just shouldn't be reporting compile_time at all, since the<br>numbers are meaningless from a performance-tracking use case.<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p><p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p><p class=MsoPlainText><span style='color:black'>Another thought: if we could reduce the overall run-time of the LNT run in<br>benchmark-only mode, we could run more "multi-samples" in the same amount of<br>time. I did a quick analysis on whether it would be worthwhile to invest effort<br>in making some of the long-running programs in the test-suite run shorter in<br>benchmarking mode. On the Cortex-A53 board, it shows that the 27 longest-running<br>programs out of the 300-ish consume about half the run-time. If we could easily<br>make these 27 programs run an order-of-magnitude less long, we could almost halve<br>the total execution time of the test-suite, and hence run twice the number of<br>samples in the same amount of time. The longest running programs I’ve found are,<br>sorted:<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p><p class=MsoNormal style='line-height:12.75pt;background:white;vertical-align:baseline;word-break:break-all'><span style='font-size:10.5pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB'>  0. 7.23% cumulative (7.23% - 417.15s this program) nts.MultiSource/Benchmarks/PAQ8p/paq8p.exec<o:p></o:p></span></p><p class=MsoNormal style='line-height:12.75pt;background:white;vertical-align:baseline;word-break:break-all'><span style='font-size:10.5pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB'>  1. 13.74% cumulative (6.51% - 375.84s this program) nts.MultiSource/Benchmarks/SciMark2-C/scimark2.exec<o:p></o:p></span></p><p class=MsoNormal style='line-height:12.75pt;background:white;vertical-align:baseline;word-break:break-all'><span style='font-size:10.5pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB'>  2. 18.83% cumulative (5.08% - 293.16s this program) nts.SingleSource/Benchmarks/Polybench/linear-algebra/kernels/symm/symm.exec<o:p></o:p></span></p><p class=MsoNormal style='line-height:12.75pt;background:white;vertical-align:baseline;word-break:break-all'><span style='font-size:10.5pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB'>  3. 21.60% cumulative (2.77% - 160.02s this program) nts.MultiSource/Benchmarks/mafft/pairlocalalign.exec<o:p></o:p></span></p><p class=MsoNormal style='line-height:12.75pt;background:white;vertical-align:baseline;word-break:break-all'><span style='font-size:10.5pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB'>  4. 24.01% cumulative (2.41% - 138.98s this program) nts.SingleSource/Benchmarks/CoyoteBench/almabench.exec<o:p></o:p></span></p><p class=MsoNormal style='line-height:12.75pt;background:white;vertical-align:baseline;word-break:break-all'><span style='font-size:10.5pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB'>  5. 26.32% cumulative (2.32% - 133.59s this program) nts.MultiSource/Applications/lua/lua.exec<o:p></o:p></span></p><p class=MsoNormal style='line-height:12.75pt;background:white;vertical-align:baseline;word-break:break-all'><span style='font-size:10.5pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB'>  6. 28.26% cumulative (1.94% - 111.80s this program) nts.MultiSource/Benchmarks/ASC_Sequoia/IRSmk/IRSmk.exec<o:p></o:p></span></p><p class=MsoNormal style='line-height:12.75pt;background:white;vertical-align:baseline;word-break:break-all'><span style='font-size:10.5pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB'>  7. 30.11% cumulative (1.85% - 106.56s this program) nts.MultiSource/Benchmarks/ASC_Sequoia/AMGmk/AMGmk.exec<o:p></o:p></span></p><p class=MsoNormal style='line-height:12.75pt;background:white;vertical-align:baseline;word-break:break-all'><span style='font-size:10.5pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB'>  8. 31.60% cumulative (1.49% - 86.00s this program) nts.SingleSource/Benchmarks/CoyoteBench/huffbench.exec<o:p></o:p></span></p><p class=MsoNormal style='line-height:12.75pt;background:white;vertical-align:baseline;word-break:break-all'><span style='font-size:10.5pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB'>  9. 32.75% cumulative (1.15% - 66.37s this program) nts.MultiSource/Benchmarks/TSVC/NodeSplitting-dbl/NodeSplitting-dbl.exec<o:p></o:p></span></p><p class=MsoNormal style='line-height:12.75pt;background:white;vertical-align:baseline;word-break:break-all'><span style='font-size:10.5pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB'> 10. 33.90% cumulative (1.15% - 66.13s this program) nts.MultiSource/Applications/hexxagon/hexxagon.exec<o:p></o:p></span></p><p class=MsoNormal style='line-height:12.75pt;background:white;vertical-align:baseline;word-break:break-all'><span style='font-size:10.5pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB'> 11. 35.04% cumulative (1.14% - 65.98s this program) nts.SingleSource/Benchmarks/Polybench/linear-algebra/kernels/syr2k/syr2k.exec<o:p></o:p></span></p><p class=MsoNormal style='line-height:12.75pt;background:white;vertical-align:baseline;word-break:break-all'><span style='font-size:10.5pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB'> 12. 36.14% cumulative (1.10% - 63.21s this program) nts.MultiSource/Benchmarks/TSVC/IndirectAddressing-dbl/IndirectAddressing-dbl.exec<o:p></o:p></span></p><p class=MsoNormal style='line-height:12.75pt;background:white;vertical-align:baseline;word-break:break-all'><span style='font-size:10.5pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB'> 13. 37.22% cumulative (1.08% - 62.35s this program) nts.SingleSource/Benchmarks/SmallPT/smallpt.exec<o:p></o:p></span></p><p class=MsoNormal style='line-height:12.75pt;background:white;vertical-align:baseline;word-break:break-all'><span style='font-size:10.5pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB'> 14. 38.30% cumulative (1.08% - 62.30s this program) nts.MultiSource/Benchmarks/nbench/nbench.exec<o:p></o:p></span></p><p class=MsoNormal style='line-height:12.75pt;background:white;vertical-align:baseline;word-break:break-all'><span style='font-size:10.5pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB'> 15. 39.37% cumulative (1.07% - 61.98s this program) nts.MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl.exec<o:p></o:p></span></p><p class=MsoNormal style='line-height:12.75pt;background:white;vertical-align:baseline;word-break:break-all'><span style='font-size:10.5pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB'> 16. 40.40% cumulative (1.03% - 59.50s this program) nts.MultiSource/Applications/SPASS/SPASS.exec<o:p></o:p></span></p><p class=MsoNormal style='line-height:12.75pt;background:white;vertical-align:baseline;word-break:break-all'><span style='font-size:10.5pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB'> 17. 41.37% cumulative (0.97% - 55.74s this program) nts.MultiSource/Benchmarks/TSVC/Expansion-dbl/Expansion-dbl.exec<o:p></o:p></span></p><p class=MsoNormal style='line-height:12.75pt;background:white;vertical-align:baseline;word-break:break-all'><span style='font-size:10.5pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB'> 18. 42.33% cumulative (0.96% - 55.40s this program) nts.SingleSource/Benchmarks/Misc/ReedSolomon.exec<o:p></o:p></span></p><p class=MsoNormal style='line-height:12.75pt;background:white;vertical-align:baseline;word-break:break-all'><span style='font-size:10.5pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB'> 19. 43.27% cumulative (0.94% - 54.34s this program) nts.MultiSource/Benchmarks/TSVC/IndirectAddressing-flt/IndirectAddressing-flt.exec<o:p></o:p></span></p><p class=MsoNormal style='line-height:12.75pt;background:white;vertical-align:baseline;word-break:break-all'><span style='font-size:10.5pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB'> 20. 44.21% cumulative (0.94% - 54.20s this program) nts.MultiSource/Benchmarks/TSVC/StatementReordering-dbl/StatementReordering-dbl.exec<o:p></o:p></span></p><p class=MsoNormal style='line-height:12.75pt;background:white;vertical-align:baseline;word-break:break-all'><span style='font-size:10.5pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB'> 21. 45.12% cumulative (0.91% - 52.46s this program) nts.SingleSource/Benchmarks/Polybench/datamining/covariance/covariance.exec<o:p></o:p></span></p><p class=MsoNormal style='line-height:12.75pt;background:white;vertical-align:baseline;word-break:break-all'><span style='font-size:10.5pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB'> 22. 46.01% cumulative (0.89% - 51.49s this program) nts.MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk.exec<o:p></o:p></span></p><p class=MsoNormal style='line-height:12.75pt;background:white;vertical-align:baseline;word-break:break-all'><span style='font-size:10.5pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB'> 23. 46.89% cumulative (0.88% - 50.66s this program) nts.MultiSource/Benchmarks/TSVC/ControlFlow-flt/ControlFlow-flt.exec<o:p></o:p></span></p><p class=MsoNormal style='line-height:12.75pt;background:white;vertical-align:baseline;word-break:break-all'><span style='font-size:10.5pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB'> 24. 47.73% cumulative (0.84% - 48.74s this program) nts.MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl.exec<o:p></o:p></span></p><p class=MsoNormal style='line-height:12.75pt;background:white;vertical-align:baseline;word-break:break-all'><span style='font-size:10.5pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB'> 25. 48.57% cumulative (0.84% - 48.43s this program) nts.MultiSource/Benchmarks/TSVC/InductionVariable-dbl/InductionVariable-dbl.exec<o:p></o:p></span></p><p class=MsoNormal style='line-height:12.75pt;background:white;vertical-align:baseline;word-break:break-all'><span style='font-size:10.5pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB'> 26. 49.40% cumulative (0.83% - 47.92s this program) nts.SingleSource/Benchmarks/Polybench/datamining/correlation/correlation.exec<o:p></o:p></span></p><p class=MsoNormal style='line-height:12.75pt;background:white;vertical-align:baseline;word-break:break-all'><span style='font-size:10.5pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB'> 27. 50.22% cumulative (0.81% - 46.92s this program) nts.MultiSource/Benchmarks/TSVC/NodeSplitting-flt/NodeSplitting-flt.exec<o:p></o:p></span></p><p class=MsoNormal style='line-height:12.75pt;background:white;vertical-align:baseline;word-break:break-all'><span style='font-size:10.5pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB'> 28. 51.03% cumulative (0.81% - 46.90s this program) nts.MultiSource/Applications/minisat/minisat.exec<o:p></o:p></span></p><p class=MsoNormal style='line-height:12.75pt;background:white;vertical-align:baseline;word-break:break-all'><span style='font-size:10.5pt;font-family:"Courier New";color:black;mso-fareast-language:EN-GB'> 29. 51.81% cumulative (0.78% - 44.88s this program) nts.MultiSource/Benchmarks/TSVC/Packing-dbl/Packing-dbl.exec<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'>…<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p><p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p><p class=MsoPlainText><span style='color:black'>For example, there seem to be a lot of TSVC benchmarks in the longest running ones.<br>They all seem to take a command line parameter to define the number of iterations the main<br>loop in the benchmark should be running. Just tuning these, so all these benchmarks runs<br>O(1s) would make the overall test-suite already run significantly faster.<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p><p class=MsoPlainText><span style='color:black'>For the Polybench test cases: they print out lots of floating point numbers – this<br>probably should be changed in the makefile so they don’t dump the matrices they work<br>on anymore. I’m not sure how big the impact will be on overall run time for the Polybench<br>benchmarks when doing this.<o:p></o:p></span></p><p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p><p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p><p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p><p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p><p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p><p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p><p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p><p class=MsoPlainText><span style='color:black'><o:p> </o:p></span></p><p class=MsoPlainText>> _______________________________________________</p><p class=MsoPlainText>> LLVM Developers mailing list</p><p class=MsoPlainText>> <a href="mailto:LLVMdev@cs.uiuc.edu"><span style='color:windowtext;text-decoration:none'>LLVMdev@cs.uiuc.edu</span></a>         <a href="http://llvm.cs.uiuc.edu"><span style='color:windowtext;text-decoration:none'>http://llvm.cs.uiuc.edu</span></a></p><p class=MsoPlainText>> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev"><span style='color:windowtext;text-decoration:none'>http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</span></a></p></div></body></html>