[llvm-dev] Questions About LLVM Test Suite: Time Units, Re-running benchmarks
    Mircea Trofin via llvm-dev 
    llvm-dev at lists.llvm.org
       
    Mon Jul 19 16:13:06 PDT 2021
    
    
  
On Mon., Jul. 19, 2021, 12:47 Stefanos Baziotis, <
stefanos.baziotis at gmail.com> wrote:
> Hi,
>
> Usually one does not compare executions of the entire test-suite, but
>> look for which programs have regressed. In this scenario only relative
>> changes between programs matter, so μs are only compared to μs and
>> seconds only compared to seconds.
>
>
> That's true, but there are different insights one can get from, say, a 30%
> increase in a program that initially took 100μs and one which initially
> took 10s.
>
> What do you mean? Don't you get the exec_time per program?
>
>
> Yes, but JSON file does not include the time _unit_. Actually, I think the
> correct phrasing
> is "unit of time", not "time unit", my bad. In any case, I mean that you
> get
> e.g., "exec_time": 4, but you don't know if this 4 is 4 seconds or
> 4 μs or whatever other unit of time.
>
> For example, the only reason that it seems that MultiSource/ use
> seconds is just because I ran a bunch of them manually (and because
> some outputs saved by llvm-lit, which measure in seconds, match
> the numbers on JSON).
>
> If we know the unit of time per test case (or per X grouping of tests
> for that matter), we could then, e.g., normalize the times, as you
> suggest, or anyway, know the unit of time and act accordingly.
>
> Running the programs a second time did work for me in the past.
>
>
> Ok, it seems it works for me if I wait, but it seems it behaves differently
> the second time. Anyway, not important.
>
> It depends. You can run in parallel, but then you should increase the
>> number of samples (executions) appropriately to counter the increased
>> noise. Depending on how many cores your system has, it might not be
>> worth it, but instead try to make the system as deterministic as
>> possible (single thread, thread affinity, avoid background processes,
>> use perf instead of timeit, avoid context switches etc. ). To avoid
>> systematic bias because always the same cache-sensitive programs run
>> in parallel, use the --shuffle option.
>
>
> I see, thanks. I didn't know about the --shuffle option, interesting.
>
> Btw, when using perf (i.e., using TEST_SUITE_USE_PERF in cmake), it seems
> that perf runs both during the
> build (i.e., make) and the run (i.e., llvm-lit) of the tests. It's not
> important but do you happen to know
> why does this happen?
>
> Also, depending on what you are trying to achieve (and what your platform
>> target is), you could enable perfcounter
>> <https://github.com/google/benchmark/blob/main/docs/perf_counters.md>
>> collection;
>
>
> Thanks, that can be useful in a bunch of cases. I should not that perf
> stats are not included in the
> JSON file. Is the "canonical" way to access them to follow the
> CMakeFiles/<benchmark name>.dir/<benchmark name>.time.perfstats ?
>
You need to specify which counters you want collected, up to 3 - see the
link above (also, you need to opt in to linking libpfm)
>
> For example, let's say that I want the perf stats for
> test-suite/SingleSource/Benchmarks/Adobe-C++/loop_unroll.cpp
> To find them, I should go to the same path but in the build directory,
> i.e.,: test-suite-build/SingleSource/Benchmarks/Adobe-C++/
> and then follow the pattern above, so, the .perfstats file will be in:
> test-suite-build/SingleSource/Benchmarks/Adobe-C++/CMakeFiles/loop_unroll.dir/loop_unroll.cpp.time.perfstats
>
> Sorry for the long path strings, but I couldn't make it clear otherwise.
>
> Thanks to both,
> Stefanos
>
> Στις Δευ, 19 Ιουλ 2021 στις 5:36 μ.μ., ο/η Mircea Trofin <
> mtrofin at google.com> έγραψε:
>
>>
>>
>> On Sun, Jul 18, 2021 at 8:58 PM Michael Kruse via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Am So., 18. Juli 2021 um 11:14 Uhr schrieb Stefanos Baziotis via
>>> llvm-dev <llvm-dev at lists.llvm.org>:
>>> > Now, to the questions. First, there doesn't seem to be a common time
>>> unit for
>>> > "exec_time" among the different tests. For instance, SingleSource/
>>> seem to use
>>> > seconds while MicroBenchmarks seem to use μs. So, we can't reliably
>>> judge
>>> > changes. Although I get the fact that micro-benchmarks are different
>>> in nature
>>> > than Single/MultiSource benchmarks, so maybe one should focus only on
>>> > the one or the other depending on what they're interested in.
>>>
>>> Usually one does not compare executions of the entire test-suite, but
>>> look for which programs have regressed. In this scenario only relative
>>> changes between programs matter, so μs are only compared to μs and
>>> seconds only compared to seconds.
>>>
>>>
>>> > In any case, it would at least be great if the JSON data contained the
>>> time unit per test,
>>> > but that is not happening either.
>>>
>>> What do you mean? Don't you get the exec_time per program?
>>>
>>>
>>> > Do you think that the lack of time unit info is a problem ? If yes, do
>>> you like the
>>> > solution of adding the time unit in the JSON or do you want to propose
>>> an alternative?
>>>
>>> You could also normalize the time unit that is emitted to JSON to s or
>>> ms.
>>>
>>> >
>>> > The second question has to do with re-running the benchmarks: I do
>>> > cmake + make + llvm-lit -v -j 1 -o out.json .
>>> > but if I try to do the latter another time, it just does/shows
>>> nothing. Is there any reason
>>> > that the benchmarks can't be run a second time? Could I somehow run it
>>> a second time ?
>>>
>>> Running the programs a second time did work for me in the past.
>>> Remember to change the output to another file or the previous .json
>>> will be overwritten.
>>>
>>>
>>> > Lastly, slightly off-topic but while we're on the subject of
>>> benchmarking,
>>> > do you think it's reliable to run with -j <number of cores> ? I'm a
>>> little bit afraid of
>>> > the shared caches (because misses should be counted in the CPU time,
>>> which
>>> > is what is measured in "exec_time" AFAIU)
>>> > and any potential multi-threading that the tests may use.
>>>
>>> It depends. You can run in parallel, but then you should increase the
>>> number of samples (executions) appropriately to counter the increased
>>> noise. Depending on how many cores your system has, it might not be
>>> worth it, but instead try to make the system as deterministic as
>>> possible (single thread, thread affinity, avoid background processes,
>>> use perf instead of timeit, avoid context switches etc. ). To avoid
>>> systematic bias because always the same cache-sensitive programs run
>>> in parallel, use the --shuffle option.
>>>
>>> Also, depending on what you are trying to achieve (and what your
>> platform target is), you could enable perfcounter
>> <https://github.com/google/benchmark/blob/main/docs/perf_counters.md>collection;
>> if instruction counts are sufficient (for example), the value will probably
>> not vary much with  multi-threading.
>>
>> ...but it's probably best to avoid system noise altogether. On Intel,
>> afaik that includes disabling turbo boost and hyperthreading, along with
>> Michael's recommendations.
>>
>> Michael
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210719/e98a3a11/attachment.html>
    
    
More information about the llvm-dev
mailing list