[llvm-dev] Noisy benchmark results?

Tue Feb 28 12:51:26 PST 2017

> On Feb 27, 2017, at 1:36 AM, Kristof Beyls via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> Hi Mikael,
> 
> Some noisiness in benchmark results is expected, but the numbers you see seem to be higher than I'd expect.
> A number of tricks people use to get lower noise results are (with the lnt runtest nt command line options to enable it between brackets):
> * Only build the benchmarks in parallel, but do the actual running of the benchmark code at most one at a time. (--threads 1 --build-threads 6).

This seems critical, I always do that.

> * Make lnt use linux perf to get more accurate timing for short-running benchmarks (--use-perf=1)
> * Pin the running benchmark to a specific core, so the OS doesn't move the benchmark process from core to core. (--make-param=RUNUNDER=taskset -c 1)
> * Only run the programs that are marked as a benchmark; some of the tests in the test-suite are not intended to be used as a benchmark (--benchmarking-only)
> * Make sure each program gets run multiple times, so that LNT has a higher chance of recognizing which programs are inherently noisy (--multisample=3)

This as well, with usually 5 multisamples.

I’d add to this good list: disable frequency scaling / turbo boost. In case of thermal throttling it can skew the results.

— 
Mehdi

> 
> I hope this is the kind of answer you were looking for?
> Do the above measures reduce the noisiness to acceptable levels for your setup?
> 
> Thanks,
> 
> Kristof
> 
> 
>> On 27 Feb 2017, at 09:46, Mikael Holmén via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>> 
>> Hi,
>> 
>> I'm trying to run the benchmark suite:
>> http://llvm.org/docs/TestingGuide.html#test-suite-quickstart
>> 
>> I'm doing it the lnt way, as described at:
>> http://llvm.org/docs/lnt/quickstart.html
>> 
>> I don't know what to expect but the results seems to be quite noisy and unstable. E.g I've done two runs on two different commits that only differ by a space in CODE_OWNERS.txt on my 12 core ubuntu 14.04 machine with:
>> 
>> lnt runtest nt --sandbox SANDBOX --cc <path-to-my-clang> --test-suite /data/repo/test-suite -j 8
>> 
>> And then I get the following top execution time regressions:
>> http://i.imgur.com/sv1xzlK.png
>> 
>> The numbers bounce around a lot if I do more runs.
>> 
>> Given the amount of noise I see here I don't know to sort out significant regressions if I actually do a real change in the compiler.
>> 
>> Are the above results expected?
>> 
>> How to use this?
>> 
>> 
>> As a bonus question, if I instead run the benchmarks with an added -m32:
>> lnt runtest nt --sandbox SANDBOX --cflag=-m32 --cc <path-to-my-clang> --test-suite /data/repo/test-suite -j 8
>> 
>> I get three failures:
>> 
>> --- Tested: 2465 tests --
>> FAIL: MultiSource/Applications/ClamAV/clamscan.compile_time (1 of 2465)
>> FAIL: MultiSource/Applications/ClamAV/clamscan.execution_time (494 of 2465)
>> FAIL: MultiSource/Benchmarks/DOE-ProxyApps-C/XSBench/XSBench.execution_time (495 of 2465)
>> 
>> Is this known/expected or do I do something stupid?
>> 
>> Thanks,
>> Mikael
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev