[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

Tue May 26 19:05:38 PDT 2015

Update: in that same block of 10,000 LLVM/Clang revisions, this the number
of distinct SHA1 hashes for the binaries of the following benchmarks:

7 MultiSource/Applications/aha/aha
2 MultiSource/Benchmarks/BitBench/drop3/drop3
10 MultiSource/Benchmarks/BitBench/five11/five11
7 MultiSource/Benchmarks/BitBench/uudecode/uudecode
3 MultiSource/Benchmarks/BitBench/uuencode/uuencode
5 MultiSource/Benchmarks/Trimaran/enc-rc4/rc4
11 SingleSource/Benchmarks/BenchmarkGame/n-body
2 SingleSource/Benchmarks/Shootout/ackermann

Let me know if there are any specific benchmarks you would like me to test.

-- Sean Silva

On Wed, May 20, 2015 at 3:31 PM, Sean Silva <chisophugis at gmail.com> wrote:

> I found an interesting datapoint:
>
> In the last 10,000 revisions of LLVM+Clang, only 10 revisions actually
> caused the binary of MultiSource/Benchmarks/BitBench/five11 to change. So
> if just store a hash of the binary in the database, we should be able to
> pool all samples we have collected while the binary is the the same as it
> currently is, which will let us use significantly more datapoints for the
> reference.
>
> Also, we can trivially eliminate running the regression detection
> algorithm if the binary hasn't changed.
>
> -- Sean Silva
>
> On Mon, May 18, 2015 at 9:02 PM, Chris Matthews <chris.matthews at apple.com>
> wrote:
>
>> The reruns flag already does that.  It helps a bit, but only as long as
>> the the benchmark is flagged as regressed.
>>
>>
>> On May 18, 2015, at 8:28 PM, Sean Silva <chisophugis at gmail.com> wrote:
>>
>>
>>
>> On Mon, May 18, 2015 at 11:24 AM, Mikhail Zolotukhin <
>> mzolotukhin at apple.com> wrote:
>>
>>> Hi Chris and others!
>>>
>>> I totally support any work in this direction.
>>>
>>> In the current state LNT’s regression detection system is too noisy,
>>> which makes it almost impossible to use in some cases. If after each run a
>>> developer gets a dozen of ‘regressions’, none of which happens to be real,
>>> he/she won’t care about such reports after a while. We clearly need to
>>> filter out as much noise as we can - and as it turns out even simplest
>>> techniques could help here. For example, the technique I used (which you
>>> mentioned earlier) takes ~15 lines of code to implement and filters out
>>> almost all noise in our internal data-sets. It’d be really cool to have
>>> something more scientifically-proven though:)
>>>
>>> One thing to add from me - I think we should try to do our best in
>>> assumption that we don’t have enough samples. Of course, the more data we
>>> have - the better, but in many cases we can’t (or we don’t want) to
>>> increase number os samples, since it dramatically increases testing time.
>>>
>>
>> Why not just start out with only a few samples, then collect more for
>> benchmarks that appear to have changed?
>>
>> -- Sean Silva
>>
>>
>>> That’s not to discourage anyone from increasing number of samples, or
>>> adding techniques relying on a significant number of samples, but rather to
>>> try mining as many ‘samples’ as possible from the data we have - e.g. I
>>> absolutely agree with your idea to pass more than 1 previous run.
>>>
>>> Thanks,
>>> Michael
>>>
>>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150526/7f20cf54/attachment.html>