[llvm-dev] Noisy benchmark results?

Kristof Beyls via llvm-dev llvm-dev at lists.llvm.org
Mon Feb 27 01:36:25 PST 2017


Hi Mikael,

Some noisiness in benchmark results is expected, but the numbers you see seem to be higher than I'd expect.
A number of tricks people use to get lower noise results are (with the lnt runtest nt command line options to enable it between brackets):
* Only build the benchmarks in parallel, but do the actual running of the benchmark code at most one at a time. (--threads 1 --build-threads 6).
* Make lnt use linux perf to get more accurate timing for short-running benchmarks (--use-perf=1)
* Pin the running benchmark to a specific core, so the OS doesn't move the benchmark process from core to core. (--make-param=RUNUNDER=taskset -c 1)
* Only run the programs that are marked as a benchmark; some of the tests in the test-suite are not intended to be used as a benchmark (--benchmarking-only)
* Make sure each program gets run multiple times, so that LNT has a higher chance of recognizing which programs are inherently noisy (--multisample=3)

I hope this is the kind of answer you were looking for?
Do the above measures reduce the noisiness to acceptable levels for your setup?

Thanks,

Kristof


> On 27 Feb 2017, at 09:46, Mikael Holmén via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> Hi,
> 
> I'm trying to run the benchmark suite:
> http://llvm.org/docs/TestingGuide.html#test-suite-quickstart
> 
> I'm doing it the lnt way, as described at:
> http://llvm.org/docs/lnt/quickstart.html
> 
> I don't know what to expect but the results seems to be quite noisy and unstable. E.g I've done two runs on two different commits that only differ by a space in CODE_OWNERS.txt on my 12 core ubuntu 14.04 machine with:
> 
> lnt runtest nt --sandbox SANDBOX --cc <path-to-my-clang> --test-suite /data/repo/test-suite -j 8
> 
> And then I get the following top execution time regressions:
> http://i.imgur.com/sv1xzlK.png
> 
> The numbers bounce around a lot if I do more runs.
> 
> Given the amount of noise I see here I don't know to sort out significant regressions if I actually do a real change in the compiler.
> 
> Are the above results expected?
> 
> How to use this?
> 
> 
> As a bonus question, if I instead run the benchmarks with an added -m32:
> lnt runtest nt --sandbox SANDBOX --cflag=-m32 --cc <path-to-my-clang> --test-suite /data/repo/test-suite -j 8
> 
> I get three failures:
> 
> --- Tested: 2465 tests --
> FAIL: MultiSource/Applications/ClamAV/clamscan.compile_time (1 of 2465)
> FAIL: MultiSource/Applications/ClamAV/clamscan.execution_time (494 of 2465)
> FAIL: MultiSource/Benchmarks/DOE-ProxyApps-C/XSBench/XSBench.execution_time (495 of 2465)
> 
> Is this known/expected or do I do something stupid?
> 
> Thanks,
> Mikael
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



More information about the llvm-dev mailing list