[llvm-dev] Questions About LLVM Test Suite: Time Units, Re-running benchmarks

Stefanos Baziotis via llvm-dev llvm-dev at lists.llvm.org
Mon Jul 19 12:46:53 PDT 2021


Hi,

Usually one does not compare executions of the entire test-suite, but
> look for which programs have regressed. In this scenario only relative
> changes between programs matter, so μs are only compared to μs and
> seconds only compared to seconds.


That's true, but there are different insights one can get from, say, a 30%
increase in a program that initially took 100μs and one which initially
took 10s.

What do you mean? Don't you get the exec_time per program?


Yes, but JSON file does not include the time _unit_. Actually, I think the
correct phrasing
is "unit of time", not "time unit", my bad. In any case, I mean that you get
e.g., "exec_time": 4, but you don't know if this 4 is 4 seconds or
4 μs or whatever other unit of time.

For example, the only reason that it seems that MultiSource/ use
seconds is just because I ran a bunch of them manually (and because
some outputs saved by llvm-lit, which measure in seconds, match
the numbers on JSON).

If we know the unit of time per test case (or per X grouping of tests
for that matter), we could then, e.g., normalize the times, as you
suggest, or anyway, know the unit of time and act accordingly.

Running the programs a second time did work for me in the past.


Ok, it seems it works for me if I wait, but it seems it behaves differently
the second time. Anyway, not important.

It depends. You can run in parallel, but then you should increase the
> number of samples (executions) appropriately to counter the increased
> noise. Depending on how many cores your system has, it might not be
> worth it, but instead try to make the system as deterministic as
> possible (single thread, thread affinity, avoid background processes,
> use perf instead of timeit, avoid context switches etc. ). To avoid
> systematic bias because always the same cache-sensitive programs run
> in parallel, use the --shuffle option.


I see, thanks. I didn't know about the --shuffle option, interesting.

Btw, when using perf (i.e., using TEST_SUITE_USE_PERF in cmake), it seems
that perf runs both during the
build (i.e., make) and the run (i.e., llvm-lit) of the tests. It's not
important but do you happen to know
why does this happen?

Also, depending on what you are trying to achieve (and what your platform
> target is), you could enable perfcounter
> <https://github.com/google/benchmark/blob/main/docs/perf_counters.md>
> collection;


Thanks, that can be useful in a bunch of cases. I should not that perf
stats are not included in the
JSON file. Is the "canonical" way to access them to follow the
CMakeFiles/<benchmark name>.dir/<benchmark name>.time.perfstats ?

For example, let's say that I want the perf stats for
test-suite/SingleSource/Benchmarks/Adobe-C++/loop_unroll.cpp
To find them, I should go to the same path but in the build directory,
i.e.,: test-suite-build/SingleSource/Benchmarks/Adobe-C++/
and then follow the pattern above, so, the .perfstats file will be in:
test-suite-build/SingleSource/Benchmarks/Adobe-C++/CMakeFiles/loop_unroll.dir/loop_unroll.cpp.time.perfstats

Sorry for the long path strings, but I couldn't make it clear otherwise.

Thanks to both,
Stefanos

Στις Δευ, 19 Ιουλ 2021 στις 5:36 μ.μ., ο/η Mircea Trofin <mtrofin at google.com>
έγραψε:

>
>
> On Sun, Jul 18, 2021 at 8:58 PM Michael Kruse via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Am So., 18. Juli 2021 um 11:14 Uhr schrieb Stefanos Baziotis via
>> llvm-dev <llvm-dev at lists.llvm.org>:
>> > Now, to the questions. First, there doesn't seem to be a common time
>> unit for
>> > "exec_time" among the different tests. For instance, SingleSource/ seem
>> to use
>> > seconds while MicroBenchmarks seem to use μs. So, we can't reliably
>> judge
>> > changes. Although I get the fact that micro-benchmarks are different in
>> nature
>> > than Single/MultiSource benchmarks, so maybe one should focus only on
>> > the one or the other depending on what they're interested in.
>>
>> Usually one does not compare executions of the entire test-suite, but
>> look for which programs have regressed. In this scenario only relative
>> changes between programs matter, so μs are only compared to μs and
>> seconds only compared to seconds.
>>
>>
>> > In any case, it would at least be great if the JSON data contained the
>> time unit per test,
>> > but that is not happening either.
>>
>> What do you mean? Don't you get the exec_time per program?
>>
>>
>> > Do you think that the lack of time unit info is a problem ? If yes, do
>> you like the
>> > solution of adding the time unit in the JSON or do you want to propose
>> an alternative?
>>
>> You could also normalize the time unit that is emitted to JSON to s or ms.
>>
>> >
>> > The second question has to do with re-running the benchmarks: I do
>> > cmake + make + llvm-lit -v -j 1 -o out.json .
>> > but if I try to do the latter another time, it just does/shows nothing.
>> Is there any reason
>> > that the benchmarks can't be run a second time? Could I somehow run it
>> a second time ?
>>
>> Running the programs a second time did work for me in the past.
>> Remember to change the output to another file or the previous .json
>> will be overwritten.
>>
>>
>> > Lastly, slightly off-topic but while we're on the subject of
>> benchmarking,
>> > do you think it's reliable to run with -j <number of cores> ? I'm a
>> little bit afraid of
>> > the shared caches (because misses should be counted in the CPU time,
>> which
>> > is what is measured in "exec_time" AFAIU)
>> > and any potential multi-threading that the tests may use.
>>
>> It depends. You can run in parallel, but then you should increase the
>> number of samples (executions) appropriately to counter the increased
>> noise. Depending on how many cores your system has, it might not be
>> worth it, but instead try to make the system as deterministic as
>> possible (single thread, thread affinity, avoid background processes,
>> use perf instead of timeit, avoid context switches etc. ). To avoid
>> systematic bias because always the same cache-sensitive programs run
>> in parallel, use the --shuffle option.
>>
>> Also, depending on what you are trying to achieve (and what your platform
> target is), you could enable perfcounter
> <https://github.com/google/benchmark/blob/main/docs/perf_counters.md>collection;
> if instruction counts are sufficient (for example), the value will probably
> not vary much with  multi-threading.
>
> ...but it's probably best to avoid system noise altogether. On Intel,
> afaik that includes disabling turbo boost and hyperthreading, along with
> Michael's recommendations.
>
> Michael
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210719/fefc511d/attachment.html>


More information about the llvm-dev mailing list