[llvm-dev] Noisy benchmark results?
Mehdi Amini via llvm-dev
llvm-dev at lists.llvm.org
Tue Feb 28 12:51:26 PST 2017
> On Feb 27, 2017, at 1:36 AM, Kristof Beyls via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> Hi Mikael,
> Some noisiness in benchmark results is expected, but the numbers you see seem to be higher than I'd expect.
> A number of tricks people use to get lower noise results are (with the lnt runtest nt command line options to enable it between brackets):
> * Only build the benchmarks in parallel, but do the actual running of the benchmark code at most one at a time. (--threads 1 --build-threads 6).
This seems critical, I always do that.
> * Make lnt use linux perf to get more accurate timing for short-running benchmarks (--use-perf=1)
> * Pin the running benchmark to a specific core, so the OS doesn't move the benchmark process from core to core. (--make-param=RUNUNDER=taskset -c 1)
> * Only run the programs that are marked as a benchmark; some of the tests in the test-suite are not intended to be used as a benchmark (--benchmarking-only)
> * Make sure each program gets run multiple times, so that LNT has a higher chance of recognizing which programs are inherently noisy (--multisample=3)
This as well, with usually 5 multisamples.
I’d add to this good list: disable frequency scaling / turbo boost. In case of thermal throttling it can skew the results.
> I hope this is the kind of answer you were looking for?
> Do the above measures reduce the noisiness to acceptable levels for your setup?
>> On 27 Feb 2017, at 09:46, Mikael Holmén via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>> I'm trying to run the benchmark suite:
>> I'm doing it the lnt way, as described at:
>> I don't know what to expect but the results seems to be quite noisy and unstable. E.g I've done two runs on two different commits that only differ by a space in CODE_OWNERS.txt on my 12 core ubuntu 14.04 machine with:
>> lnt runtest nt --sandbox SANDBOX --cc <path-to-my-clang> --test-suite /data/repo/test-suite -j 8
>> And then I get the following top execution time regressions:
>> The numbers bounce around a lot if I do more runs.
>> Given the amount of noise I see here I don't know to sort out significant regressions if I actually do a real change in the compiler.
>> Are the above results expected?
>> How to use this?
>> As a bonus question, if I instead run the benchmarks with an added -m32:
>> lnt runtest nt --sandbox SANDBOX --cflag=-m32 --cc <path-to-my-clang> --test-suite /data/repo/test-suite -j 8
>> I get three failures:
>> --- Tested: 2465 tests --
>> FAIL: MultiSource/Applications/ClamAV/clamscan.compile_time (1 of 2465)
>> FAIL: MultiSource/Applications/ClamAV/clamscan.execution_time (494 of 2465)
>> FAIL: MultiSource/Benchmarks/DOE-ProxyApps-C/XSBench/XSBench.execution_time (495 of 2465)
>> Is this known/expected or do I do something stupid?
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
More information about the llvm-dev