<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Lets try this on the whole test suite?<br class=""><div class=""><br class=""></div><div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On May 26, 2015, at 7:05 PM, Sean Silva <<a href="mailto:chisophugis@gmail.com" class="">chisophugis@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class="">Update: in that same block of 10,000 LLVM/Clang revisions, this the number of distinct SHA1 hashes for the binaries of the following benchmarks:<div class=""><br class=""></div><div class=""><div class=""><div class="">7<span class="" style="white-space:pre"> </span>MultiSource/Applications/aha/aha</div><div class="">2<span class="" style="white-space:pre"> </span>MultiSource/Benchmarks/BitBench/drop3/drop3</div><div class="">10<span class="" style="white-space:pre"> </span>MultiSource/Benchmarks/BitBench/five11/five11</div><div class="">7<span class="" style="white-space:pre"> </span>MultiSource/Benchmarks/BitBench/uudecode/uudecode</div><div class="">3<span class="" style="white-space:pre"> </span>MultiSource/Benchmarks/BitBench/uuencode/uuencode</div><div class="">5<span class="" style="white-space:pre"> </span>MultiSource/Benchmarks/Trimaran/enc-rc4/rc4</div><div class="">11<span class="" style="white-space:pre"> </span>SingleSource/Benchmarks/BenchmarkGame/n-body</div><div class="">2<span class="" style="white-space:pre"> </span>SingleSource/Benchmarks/Shootout/ackermann</div></div><div class=""><br class=""></div><div class="">Let me know if there are any specific benchmarks you would like me to test.</div></div><div class=""><br class=""></div><div class="">-- Sean Silva</div><div class=""><br class=""></div></div><div class="gmail_extra"><br class=""><div class="gmail_quote">On Wed, May 20, 2015 at 3:31 PM, Sean Silva <span dir="ltr" class=""><<a href="mailto:chisophugis@gmail.com" target="_blank" class="">chisophugis@gmail.com</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="">I found an interesting datapoint:<div class=""><br class=""><div class="">In the last 10,000 revisions of LLVM+Clang, only 10 revisions actually caused the binary of MultiSource/Benchmarks/BitBench/five11 to change. So if just store a hash of the binary in the database, we should be able to pool all samples we have collected while the binary is the the same as it currently is, which will let us use significantly more datapoints for the reference.</div><div class=""><br class=""></div><div class="">Also, we can trivially eliminate running the regression detection algorithm if the binary hasn't changed.</div><span class="HOEnZb"><font color="#888888" class=""><div class=""><br class=""></div><div class="">-- Sean Silva</div></font></span></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br class=""><div class="gmail_quote">On Mon, May 18, 2015 at 9:02 PM, Chris Matthews <span dir="ltr" class=""><<a href="mailto:chris.matthews@apple.com" target="_blank" class="">chris.matthews@apple.com</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word" class="">The reruns flag already does that. It helps a bit, but only as long as the the benchmark is flagged as regressed.<div class=""><br class=""></div><div class=""><br class=""><div class=""><blockquote type="cite" class=""><div class=""><div class=""><div class="">On May 18, 2015, at 8:28 PM, Sean Silva <<a href="mailto:chisophugis@gmail.com" target="_blank" class="">chisophugis@gmail.com</a>> wrote:</div><br class=""></div></div><div class=""><div class=""><div class=""><div dir="ltr" class=""><br class=""><div class="gmail_extra"><br class=""><div class="gmail_quote">On Mon, May 18, 2015 at 11:24 AM, Mikhail Zolotukhin <span dir="ltr" class=""><<a href="mailto:mzolotukhin@apple.com" target="_blank" class="">mzolotukhin@apple.com</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word" class="">Hi Chris and others!<div class=""><br class=""></div><div class="">I totally support any work in this direction.</div><div class=""><br class=""></div><div class="">In the current state LNT’s regression detection system is too noisy, which makes it almost impossible to use in some cases. If after each run a developer gets a dozen of ‘regressions’, none of which happens to be real, he/she won’t care about such reports after a while. We clearly need to filter out as much noise as we can - and as it turns out even simplest techniques could help here. For example, the technique I used (which you mentioned earlier) takes ~15 lines of code to implement and filters out almost all noise in our internal data-sets. It’d be really cool to have something more scientifically-proven though:)</div><div class=""><br class=""></div><div class="">One thing to add from me - I think we should try to do our best in assumption that we don’t have enough samples. Of course, the more data we have - the better, but in many cases we can’t (or we don’t want) to increase number os samples, since it dramatically increases testing time.</div></div></blockquote><div class=""><br class=""></div><div class=""><span style="font-size:13px" class="">Why not just start out with only a few samples, then collect more for benchmarks that appear to have changed?</span></div><div class=""><br class=""></div><div class="">-- Sean Silva</div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word" class=""><div class=""> That’s not to discourage anyone from increasing number of samples, or adding techniques relying on a significant number of samples, but rather to try mining as many ‘samples’ as possible from the data we have - e.g. I absolutely agree with your idea to pass more than 1 previous run.</div><div class=""><br class=""></div><div class="">Thanks,</div><div class="">Michael</div><div class=""><br class=""></div></div></blockquote></div></div></div></div></div><span class="">
_______________________________________________<br class="">LLVM Developers mailing list<br class=""><a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank" class="">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu/" target="_blank" class="">http://llvm.cs.uiuc.edu</a><br class=""><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank" class="">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br class=""></span></div></blockquote></div><br class=""></div></div></blockquote></div><br class=""></div>
</div></div></blockquote></div><br class=""></div>
_______________________________________________<br class="">LLVM Developers mailing list<br class=""><a href="mailto:LLVMdev@cs.uiuc.edu" class="">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" class="">http://llvm.cs.uiuc.edu</a><br class=""><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" class="">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br class=""></div></blockquote></div><br class=""></div></body></html>