[llvm-dev] Noisy benchmark results?

Tue Feb 28 03:55:30 PST 2017

Hi,

On 02/27/2017 10:36 AM, Kristof Beyls wrote:
> Hi Mikael,
>
> Some noisiness in benchmark results is expected, but the numbers you see seem to be higher than I'd expect.
> A number of tricks people use to get lower noise results are (with the lnt runtest nt command line options to enable it between brackets):
> * Only build the benchmarks in parallel, but do the actual running of the benchmark code at most one at a time. (--threads 1 --build-threads 6).
> * Make lnt use linux perf to get more accurate timing for short-running benchmarks (--use-perf=1)
> * Pin the running benchmark to a specific core, so the OS doesn't move the benchmark process from core to core. (--make-param=RUNUNDER=taskset -c 1)
> * Only run the programs that are marked as a benchmark; some of the tests in the test-suite are not intended to be used as a benchmark (--benchmarking-only)
> * Make sure each program gets run multiple times, so that LNT has a higher chance of recognizing which programs are inherently noisy (--multisample=3)
>
> I hope this is the kind of answer you were looking for?

Spot on! Thanks!

> Do the above measures reduce the noisiness to acceptable levels for your setup?

I ran with all your suggestions above and now I have:

regressions:  http://i.imgur.com/kjA2WpG.png
improvements: http://i.imgur.com/WmRlHka.png

for two runs on two commits that only differs by a white space.

Is this as stable as you normally get it?

I suppose it's enough to be able to see if my "real" change messes 
something up or not.

Thanks,
Mikael

>
> Thanks,
>
> Kristof
>
>
>> On 27 Feb 2017, at 09:46, Mikael Holmén via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>
>> Hi,
>>
>> I'm trying to run the benchmark suite:
>> http://llvm.org/docs/TestingGuide.html#test-suite-quickstart
>>
>> I'm doing it the lnt way, as described at:
>> http://llvm.org/docs/lnt/quickstart.html
>>
>> I don't know what to expect but the results seems to be quite noisy and unstable. E.g I've done two runs on two different commits that only differ by a space in CODE_OWNERS.txt on my 12 core ubuntu 14.04 machine with:
>>
>> lnt runtest nt --sandbox SANDBOX --cc <path-to-my-clang> --test-suite /data/repo/test-suite -j 8
>>
>> And then I get the following top execution time regressions:
>> http://i.imgur.com/sv1xzlK.png
>>
>> The numbers bounce around a lot if I do more runs.
>>
>> Given the amount of noise I see here I don't know to sort out significant regressions if I actually do a real change in the compiler.
>>
>> Are the above results expected?
>>
>> How to use this?
>>
>>
>> As a bonus question, if I instead run the benchmarks with an added -m32:
>> lnt runtest nt --sandbox SANDBOX --cflag=-m32 --cc <path-to-my-clang> --test-suite /data/repo/test-suite -j 8
>>
>> I get three failures:
>>
>> --- Tested: 2465 tests --
>> FAIL: MultiSource/Applications/ClamAV/clamscan.compile_time (1 of 2465)
>> FAIL: MultiSource/Applications/ClamAV/clamscan.execution_time (494 of 2465)
>> FAIL: MultiSource/Benchmarks/DOE-ProxyApps-C/XSBench/XSBench.execution_time (495 of 2465)
>>
>> Is this known/expected or do I do something stupid?
>>
>> Thanks,
>> Mikael
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>