[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

Tue Jun 2 14:24:11 PDT 2015

Personally, I would prefer this either live in it's own repository, or 
llvm/tools/.  None of my use cases will likely involve the test-suite.

p.s. If this is going to end up an llvm tool, it will need to follow 
LLVM style.

p.p.s. We should probably start a new thread with the proposed addition 
since I imagine many folks are ignoring this one by now given how deep 
it's gotten.

Philip

On 06/02/2015 12:04 PM, Chris Matthews wrote:
> I like that idea!
>
>
>> On Jun 2, 2015, at 12:00 PM, Smith, Kevin B <kevin.b.smith at intel.com 
>> <mailto:kevin.b.smith at intel.com>> wrote:
>>
>> The code for cmpimage and getdep consists of five source files, with 
>> the following sizes
>>
>> $ wc *
>>
>>   5912  20353 191869 cmpimage.cpp
>>
>>    290   1328  10668 elf.h
>>
>>   1496   5006  41691 getdep.cpp
>>
>>    233    959   7692 macho.h
>>
>>    403   1831  18394 pecoff.h
>>
>>   8334  29477 270314 total
>>
>> to build each of them is just a simple compilation for whatever C++ 
>> compiler you happen to be using (clang, icc, cl, g++)
>>
>> $(CXX) –o cmpimage –O2 cmpimage.cpp
>>
>> $(CXX) –o getdep –O2 getdep.cpp
>>
>> This seems like it would fit rather easily into test-suite/tools, 
>> which already exists and has a Makefile that the commands to build
>>
>> these could be integrated into.
>>
>> This is my best guess/opinion based on a cursory look over the 
>> test-suite directory structure.
>>
>> Kevin
>>
>> *From:*Chris Matthews [mailto:chris.matthews at apple.com]
>> *Sent:* Thursday, May 28, 2015 1:02 PM
>> *To:* Smith, Kevin B
>> *Cc:* Philip Reames; Sean Silva; LLVM Developers Mailing List
>> *Subject:* Re: [LLVMdev] Proposal: change LNT’s regression detection 
>> algorithm and how it is used to reduce false positives
>>
>> Where is the best place to keep this?
>>
>> - As third party tool we all use?
>>
>> - Contribute as new project?
>>
>> - Lives in test-suite/utils?
>>
>> - Lives in llvm/utils?
>>
>>     On May 28, 2015, at 11:22 AM, Smith, Kevin B
>>     <kevin.b.smith at intel.com <mailto:kevin.b.smith at intel.com>> wrote:
>>
>>     OK, there is interest from at least a couple of people.  What
>>     should next steps be?
>>
>>     Kevin
>>
>>     *From:*Chris Matthews [mailto:chris.matthews at apple.com]
>>     *Sent:* Thursday, May 28, 2015 10:57 AM
>>     *To:* Philip Reames
>>     *Cc:* Smith, Kevin B; Sean Silva; LLVM Developers Mailing List
>>     *Subject:* Re: [LLVMdev] Proposal: change LNT’s regression
>>     detection algorithm and how it is used to reduce false positives
>>
>>     I agree. I think there are a lot of exciting uses for this tool.
>>      A stage 3 build bot would be another one.
>>
>>         On May 28, 2015, at 10:14 AM, Philip Reames
>>         <listmail at philipreames.com
>>         <mailto:listmail at philipreames.com>> wrote:
>>
>>         I'd love to see this tool contributed, even it isn't used for
>>         regression detection work.  I've got a couple of hacked up
>>         scripts which do similar things and having a robust tool
>>         available for this would be very useful.
>>
>>         Philip
>>
>>         On 05/26/2015 09:53 AM, Smith, Kevin B wrote:
>>
>>             Intel has a binary comparator tool that we have been
>>             using for several years for comparing output binaries
>>
>>             to see if the code within them is considered identical.
>>             We use it to eliminate runs (and therefore some
>>             performance noise)
>>
>>             from our own performance tracking tools.
>>
>>             We are willing to contribute the source code for this to
>>             the LLVM community if there is interest.
>>
>>             There are two programs involved: getdep, which displays
>>             the list of DLL/.so dependencies of the image in
>>             question, and cmpimage itself, which does the comparison
>>             ignoring the parts not contributed by the compiler.  The
>>             cmpimage program is also almost completely derived from
>>             the published object format descriptions.
>>
>>             Let me know if there is interest in these pieces of
>>             tooling, and if so, what you think next steps should be.
>>
>>             Kevin B. Smith
>>
>>             *From:*llvmdev-bounces at cs.uiuc.edu
>>             <mailto:llvmdev-bounces at cs.uiuc.edu>
>>             [mailto:llvmdev-bounces at cs.uiuc.edu] *On Behalf Of *Sean
>>             Silva
>>             *Sent:* Thursday, May 21, 2015 2:14 PM
>>             *To:* Chris Matthews
>>             *Cc:* LLVM Developers Mailing List
>>             *Subject:* Re: [LLVMdev] Proposal: change LNT’s
>>             regression detection algorithm and how it is used to
>>             reduce false positives
>>
>>             On Thu, May 21, 2015 at 11:24 AM, Chris Matthews
>>             <chris.matthews at apple.com
>>             <mailto:chris.matthews at apple.com>> wrote:
>>
>>             I agree this is a great idea.  I think it needs to be
>>             fleshed out a little though.
>>
>>             It would still be wise to run the regression detection
>>             algorithm, because the test suite changes and the
>>             machines change, and the algorithm is not perfect yet. 
>>             It would be a valuable source of information though.
>>
>>             How would running it as part of regular testing change
>>             anything? Presumably the only purpose it would serve is
>>             retrospectively going back and seeing false-positives in
>>             the aggregate. But if we are already doing offline
>>             analysis, we can run the regression detection algorithm
>>             (or any prospective ones) offline on the raw data; it
>>             doesn't take that long.
>>
>>
>>                 This is not a small change to how LNT works, so I
>>                 think some due diligence is necessary.  Is clang
>>                 *really* that deterministic, especially over
>>                 successive revs?
>>
>>             Yes. Actually, google's build system depends on this for
>>             its caching strategy to work and so the google guys are
>>             usually on top of any issues in this respect (thanks
>>             google guys!).
>>
>>                 I know it is supposed to be. Does anyone have any
>>                 data to show this is going to be an effective
>>                 approach?  It seems like there are benchmarks in the
>>                 test-suite which use __DATE__ and __TIME__ in them. I
>>                 assume that will be a problem?
>>
>>             __DATE__ and __TIME__ should be easy to solve by
>>             modifying the benchmark, or teaching clang to always
>>             return a fixed value for them (maybe we already have
>>             this? IIRC google's build system does something like
>>             this; or maybe the do it at the OS level).
>>
>>             -- Sean Silva
>>
>>
>>                 > On May 21, 2015, at 1:43 AM, Renato Golin
>>                 <renato.golin at linaro.org
>>                 <mailto:renato.golin at linaro.org>> wrote:
>>                 >
>>                 > On 20 May 2015 at 23:31, Sean Silva
>>                 <chisophugis at gmail.com
>>                 <mailto:chisophugis at gmail.com>> wrote:
>>                 >> In the last 10,000 revisions of LLVM+Clang, only
>>                 10 revisions actually
>>                 >> caused the binary of
>>                 MultiSource/Benchmarks/BitBench/five11 to change. So if
>>                 >> just store a hash of the binary in the database,
>>                 we should be able to pool
>>                 >> all samples we have collected while the binary is
>>                 the the same as it
>>                 >> currently is, which will let us use significantly
>>                 more datapoints for the
>>                 >> reference.
>>                 >
>>                 > +1
>>                 >
>>                 >
>>                 >> Also, we can trivially eliminate running the
>>                 regression detection algorithm
>>                 >> if the binary hasn't changed.
>>                 >
>>                 > +2!
>>                 >
>>                 > --renato
>>
>>                 > _______________________________________________
>>                 > LLVM Developers mailing list
>>                 > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>                 http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
>>                 > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>>
>>
>>
>>             _______________________________________________
>>
>>             LLVM Developers mailing list
>>
>>             LLVMdev at cs.uiuc.edu  <mailto:LLVMdev at cs.uiuc.edu>          http://llvm.cs.uiuc.edu  <http://llvm.cs.uiuc.edu/>
>>
>>             http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150602/7636fc74/attachment.html>