[LLVMdev] New -O3 Performance tester - Use hardware to get reliable numbers

Tue Jan 7 18:02:44 PST 2014

On 01/08/2014 02:48 AM, Sean Silva wrote:
> On Tue, Jan 7, 2014 at 11:06 AM, Tobias Grosser <tobias at grosser.es> wrote:
>
>> Hi,
>>
>> I would like to announce a new set of LNT -O3 performance testers.
>>
>> In a discussion titled "Question about results reliability in LNT
>> infrustructure" Anton suggested that one way to get statistically reliable
>> test results from the LNT infrastructure is to use a larger sample size
>> (5-10) as well as a more robust statistical test (Wilcoxon/Mann-Whitney).
>> Another requirement to make the performance results we get from our testers
>> useful is to have a per-commit performance run.
>>
>> I would like to announce that I set up 4 identical machines* that publicly
>> report LNT results for 'clang -O3' at:
>>
>> http://llvm.org/perf/db_default/v4/nts/machine/34
>>
>> We currently catch in average groups of 3-5 commits. As most commits
>> obviously do not impact performance this seems to be enough to track down
>> performance regressions/changes easily.
>>
>
> If possible, I think it would be a good idea to filter out commits that
> don't affect code generation. This would allow machine resources to be
> better used.
>
> Is there some way we can easily filter commits based on whether they affect
> code generation or not? Would it be reliable enough to check if the commit
> touches any of our integration tests?
>
> As a rough estimate:
>
> sean:~/pg/llvm/llvm % git log --oneline --since='1 month ago' | wc -l
> 706
> sean:~/pg/llvm/llvm % git log --oneline --since='1 month ago' ./test | wc -l
> 317
>
> So it seems like if this is reasonable we can effectively double our
> performance testing coverage by filtering like this.

Hi Sean,

this is a very interesting idea. Though I have no idea if checking for 
'test/ this will be enough or not. If we keep the performance tester 
running for a while, we can probably validate this assumption by 
checking if runs that do not contain integration tests showed 
performance changes (and what kind of changes).

As said before, I would be glad if I could get help with further 
improvements on the software side.

Cheers,
Tobias