[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

Mon May 18 21:08:59 PDT 2015

In r237661 I committed an initial set of tests. All the failing tests are commented out right now (python 2.7 does not have xfail).  I extracted the data sets to the top of the file so they are easy to paste into ipython etc.

> On May 18, 2015, at 9:02 PM, Chris Matthews <chris.matthews at apple.com> wrote:
> 
> The reruns flag already does that.  It helps a bit, but only as long as the the benchmark is flagged as regressed.
> 
> 
>> On May 18, 2015, at 8:28 PM, Sean Silva <chisophugis at gmail.com <mailto:chisophugis at gmail.com>> wrote:
>> 
>> 
>> 
>> On Mon, May 18, 2015 at 11:24 AM, Mikhail Zolotukhin <mzolotukhin at apple.com <mailto:mzolotukhin at apple.com>> wrote:
>> Hi Chris and others!
>> 
>> I totally support any work in this direction.
>> 
>> In the current state LNT’s regression detection system is too noisy, which makes it almost impossible to use in some cases. If after each run a developer gets a dozen of ‘regressions’, none of which happens to be real, he/she won’t care about such reports after a while. We clearly need to filter out as much noise as we can - and as it turns out even simplest techniques could help here. For example, the technique I used (which you mentioned earlier) takes ~15 lines of code to implement and filters out almost all noise in our internal data-sets. It’d be really cool to have something more scientifically-proven though:)
>> 
>> One thing to add from me - I think we should try to do our best in assumption that we don’t have enough samples. Of course, the more data we have - the better, but in many cases we can’t (or we don’t want) to increase number os samples, since it dramatically increases testing time.
>> 
>> Why not just start out with only a few samples, then collect more for benchmarks that appear to have changed?
>> 
>> -- Sean Silva
>>  
>> That’s not to discourage anyone from increasing number of samples, or adding techniques relying on a significant number of samples, but rather to try mining as many ‘samples’ as possible from the data we have - e.g. I absolutely agree with your idea to pass more than 1 previous run.
>> 
>> Thanks,
>> Michael
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>         http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150518/f166c385/attachment.html>