[PATCH] D38417: [test-suite] Adding HACCKernels app

Sat Sep 30 11:41:17 PDT 2017

kristof.beyls added a comment.

In https://reviews.llvm.org/D38417#885226, @hfinkel wrote:

> In https://reviews.llvm.org/D38417#885195, @kristof.beyls wrote:
>
> > Hi Brian,
> >
> > Thanks for working on this!
> >
> > On the execution time of about 13 seconds on a fast processor: would it be possible to adapt the input to reduce the running time to about 1 second (or less), and not loose the characteristic behaviour of this benchmark?
> >  I'd expect adapting the input to have an order of magnitude shorter running time will not make the measured execution time more noisy; and it would help in trying to make sure the test-suite keeps on running as quickly as possible.
>
>
> Kristof, just FYI, reducing the runtime is easy to do, however, this will get significantly faster once we start actually vectorizing the hot loops (at least 4x if you have an architecture with <4 x float>). I looked at this a few weeks ago, and as I recall, we currently can't if-convert reductions (which prevents vectorization). We plan on working on improving this in the near future. Does this affect your opinion at all?

I mainly keep on focussing on the run-time of newly added benchmarks to the test-suite as I do think it is a problem already that the test-suite takes too long to run.
For the benchmarks in the test-suite, this problem is far worse than for the programs that are only run to check correctness, as we typically:
a) have to run the test-suite in benchmark-mode multiple times to figure out if a performance change was noise or significant.
b) have to run programs sequentially in benchmark-mode even on multi-core systems, to reduce noise, whereas for correctness testing we can use all the cores on a system.
This is annoying when evaluating patches, and also makes the response time of performance-tracking bots annoyingly long.
We have a few hundred benchmarks in the test-suite currently, but probably need a lot more to get more coverage (which is why I think it's awesome that these DOE benchmarks are being added!).
Therefore, I think it's important to not lose focus on trying to keep benchmarks short-running as they are being added.

There's probably a lot of bikeshedding that could be done on what an acceptable run-time is for a newly-added benchmark and what is too long.
My experience on a few X86 and Arm platforms is that if you use linux perf to measure execution time, as soon as the program runs for 0.01 seconds, just running the program for longer doesn't reduce noise further.
Therefore, my limited experiments suggest to me that an ideal execution time for the benchmark programs in the test-suite would be just over 0.01 seconds - for the platforms I've tested on.
As I said, there's probably lots of arguing that could be done on what the execution time is that we should aim for when adding a new benchmark. So far, I've followed a personal rule-of-thumb that up to 1 second is acceptable, but when its more, there should be a reason for why a longer execution time is needed.
Which is why I reacted above.
As I don't think my personal 1 second rule-of-thumb is defendable any more or less than rules that set the threshold a bit higher or lower, I don't feel too strongly against this benchmark going in as is.
I just felt I had to ask the question if there was a good reason to make this benchmark run for this long.
Ultimately, vectorizing the hot loops in this benchmark won't make a change to my reasoning above.

In summary, I hope my reasoning above makes sense, and I won't oppose if you think there's a good reason to not shorten the running time of this benchmark as is.

Thanks!

Kristof

https://reviews.llvm.org/D38417