[LLVMdev] RFC:LNT Improvements

Wed Apr 30 02:05:09 PDT 2014

On 30 April 2014 07:50, Tobias Grosser <tobias at grosser.es> wrote:
> In general, I see such changes as a second step. First, we want to have a
> system in place that allows us to reliably detect if a benchmark is noisy or
> not, second we want to increase the number of benchmarks that are not noisy
> and where we can use the results.

I personally use the test-suite for correctness, not performance and
would not like to have its run time increased by any means.

As discussed in the BoF last year, if we could separate test run from
benchmark run before we do any change, I'd appreciate.

I want to have a separate benchmark bot on the subset that makes sense
to work as benchmark, but I don't want the noise of the rest.

> 1. Get 5-10 samples per run
> 2. Do the Wilcoxon/Mann-Whitney test

5-10 samples on an ARM board is not feasible. Currently it takes 1
hour to run the whole set. Making it run for 5-10 hours will reduce
its value to zero.

> I am a little sceptical on this. Machines should generally not be noisy.

ARM machines work at a much lower power level than Intel ones. The
scheduler is a lot more aggressive and the quality of the peripherals
is *a lot* worse.

Even if you set up the board for benchmarks (fix the scheduler, put
everything up to 11), the quality of the external hardware (USB, SD,
eMMC, etc) and their drivers do a lot of damage to any meaningful
number you may extract if the moon is full and Jupiter is in
Sagittarius.

So...

> However, if for some reason there is noise on the machine, the noise is as
> likely to appear during this pre-noise-test than during the actual benchmark
> runs, maybe during both, but maybe also only during the benchmark. So I am
> afraid we might often run in the situation where this test says OK but the
> later test is still suffering noise.

...this is not entirely true, on ARM.

We may be getting server quality hardware for AArch64 any time now,
but it's very unlikely that we'll *ever* get quality 32-bit test
boards.

cheers,
--renato