[llvm-dev] [test-suite] r261857 - [cmake] Add support for arbitrary metrics

Wed Mar 23 18:00:34 PDT 2016

> On Mar 23, 2016, at 5:54 PM, Matthias Braun via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> Let's move this to llvm-dev. I should describe my goals/motivation for the work I have been putting into the llvm-testsuite lately. This is how I see the llvm-test-suite today:
> 
> - We provide a familiar cmake build system so people have a known environment to tweak compilation flags.
> - Together with the benchmark executable we build a .test file that describes how to invoke the benchmark and can be run by the familiar llvm-lit tool:
> - Running a benchmark means executing its executable with a certain set of flags. Some of the SPEC benchmarks even require multiple invocations with different flags.
> - There is a set of steps to verify that the benchmark worked correctly. This usually means invoking "diff" or "fpcmp" and comparing the results with a reference file.
> - The lit benchmark driver modifies these benchmark descriptions to create a test plan. In the simplest case this means prefixing the executable with "timeit" and collecting the number. But we are adding more features like collecting code size, running the benchmark on a remote device, prefixing different instrumentation tools like the linux "perf" tool, a utility tasks that collects and merge PGO data files after a benchmark run, ...
> 
> This allows us to add new instrumentation and metrics in the future without touching the benchmarks itself. It works best for bigger benchmark that run for a while (a few seconds minimum). It works nicely with benchmark suites like SPEC, geekbench, mediabench.... Let's call this "macro benchmarking".
> 
> 
> Having said all that. You make a very good case for what we should call "micro benchmarking". The google benchmarking library does indeed look like a fantastic tool. We should definitely evaluate how we can integrate this into the llvm test-suite, we think of it as a new flavor of benchmarks. We won't be able to redesign SPEC but we surely can find things like TSVC which we could adapt to this. I have no immediate plans to put much more work into the test-suite, but I agree that micro benchmarking would be an exciting addition to our testing strategy. I'd be happy to review patches or talk through possible designs on IRC.

Note: I suggested to have the Halide test infrastructure compatible with google benchmarks framework during the initial review, because long term Halide can generate interesting micro-benchmarks.

-- 
Mehdi

> 
> - Matthias
> 
>> https://github.com/google/benchmark <https://github.com/google/benchmark>
> 
> 
> 
>> On Mar 23, 2016, at 3:45 PM, Hal Finkel <hfinkel at anl.gov <mailto:hfinkel at anl.gov>> wrote:
>> 
>> ----- Original Message -----
>>> From: "Hal Finkel via llvm-commits" <llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>>
>>> To: "Matthias Braun" <mbraun at apple.com <mailto:mbraun at apple.com>>
>>> Cc: "nd" <nd at arm.com <mailto:nd at arm.com>>, "llvm-commits" <llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>>
>>> Sent: Wednesday, March 23, 2016 5:19:37 PM
>>> Subject: Re: [test-suite] r261857 - [cmake] Add support for arbitrary metrics
>>> 
>>> ----- Original Message -----
>>>> From: "Matthias Braun" <mbraun at apple.com <mailto:mbraun at apple.com>>
>>>> To: "Hal Finkel" <hfinkel at anl.gov <mailto:hfinkel at anl.gov>>
>>>> Cc: "James Molloy" <James.Molloy at arm.com <mailto:James.Molloy at arm.com>>, "nd" <nd at arm.com <mailto:nd at arm.com>>,
>>>> "llvm-commits" <llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>>
>>>> Sent: Friday, March 4, 2016 12:36:36 PM
>>>> Subject: Re: [test-suite] r261857 - [cmake] Add support for
>>>> arbitrary metrics
>>>> 
>>>> A test can report "internal" metrics now. Though I don't think lnt
>>>> would split those into a notion of sub-tests I think.
>>>> It would be an interesting feature to add. Though if we have the
>>>> choice to modify a benchmark, we should still prefer smaller
>>>> independent ones IMO as that gives a better idea when some of the
>>>> other metrics change (compiletime, codesize, hopefully things like
>>>> memory usage or performance counters in the future).
>>> 
>>> Unless the kernels are large, their code size within the context of a
>>> complete executable might be hard to track regardless (because by
>>> the time you add in the static libc startup code, ELF headers, etc.
>>> any change would be a smaller percentage of the total). Explicitly
>>> instrumenting the code to mark regions of interest is probably best
>>> (which is true for timing too), but that seems like a separate
>>> (although worthwhile) project.
>>> 
>>> In any case, for TSVC, for example, the single test has 136 kernels;
>>> which I currently group into 18 binaries. I have a float and double
>>> version for each, so we have 36 total binaries. What you're
>>> suggesting would have us produce 272 separate executables, just for
>>> TSVC. Ideally, I'd like aligned and unaligned variants of each of
>>> these. I've not done that because I thought that 72 executables
>>> would be a bit much, but that's 544 executables if I generate one
>>> per kernel variant.
>>> 
>>> The LCALS benchmark, which I'd really like to add sometime soon, has
>>> another ~100 kernels, which is ~200 to do both float and double
>>> (which we should do).
>>> 
>>> What do you think is reasonable here?
>>> 
>> 
>> Also, we might want to consider updating some of these tests to use Google's benchmark library (https://github.com/google/benchmark <https://github.com/google/benchmark>). Have you looked at this? Aside from giving us a common output format, the real advantage of using a driver library like this is that it lets us dynamically pick the number of loop iterations based on per-iteration timing. This appeals to me because the number of iterations that is reasonable for some embedded device is normally quite different from what is reasonable for a server-class machine. Doing this, however, means that we definitely can't rely on overall executable timing. Thoughts?
>> 
>> -Hal
>> 
>>> Thanks again,
>>> Hal
>>> 
>>>> 
>>>> - Matthias
>>>> 
>>>>> On Mar 4, 2016, at 8:22 AM, Hal Finkel <hfinkel at anl.gov <mailto:hfinkel at anl.gov>> wrote:
>>>>> 
>>>>> Hi James,
>>>>> 
>>>>> If I'm reading this correctly, you can have multiple metrics per
>>>>> test. Is that correct?
>>>>> 
>>>>> I'd really like to support tests with internal timers (i.e. a
>>>>> timer
>>>>> per kernel), so that we can have more fine-grained timing without
>>>>> splitting executables into multiple parts (e.g. as I had to do
>>>>> with TSVC).
>>>>> 
>>>>> Thanks again,
>>>>> Hal
>>>>> 
>>>>> ----- Original Message -----
>>>>>> From: "James Molloy via llvm-commits"
>>>>>> <llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>>
>>>>>> To: "Matthias Braun" <mbraun at apple.com <mailto:mbraun at apple.com>>
>>>>>> Cc: "nd" <nd at arm.com <mailto:nd at arm.com>>, "llvm-commits"
>>>>>> <llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>>
>>>>>> Sent: Friday, February 26, 2016 3:06:01 AM
>>>>>> Subject: Re: [test-suite] r261857 - [cmake] Add support for
>>>>>> arbitrary metrics
>>>>>> 
>>>>>> Hi Matthias,
>>>>>> 
>>>>>> Thanks :) I’ve been working internally to move all our testing
>>>>>> from
>>>>>> ad-hoc driver scripts to CMake+LIT-based. Currently I have CMake
>>>>>> drivers for a very popular mobile benchmark (but the pre-release
>>>>>> version so pushing this upstream might be difficult), and EEMBC
>>>>>> (automotive, telecom, consumer).
>>>>>> 
>>>>>> I really want all of these to live upstream, but I have to do a
>>>>>> bit
>>>>>> of legal checking before I can push them. In the meantime I’m
>>>>>> happy
>>>>>> to add an example to the repositories; alternatively I could
>>>>>> modify
>>>>>> the SPEC drivers to also compute SPECrate as a metric?
>>>>>> 
>>>>>> Cheers,
>>>>>> 
>>>>>> James
>>>>>> 
>>>>>>> On 25 Feb 2016, at 21:33, Matthias Braun <mbraun at apple.com <mailto:mbraun at apple.com>>
>>>>>>> wrote:
>>>>>>> 
>>>>>>> Hi James,
>>>>>>> 
>>>>>>> thanks for working on the test-suite. It's nice to see new
>>>>>>> capabilities added to the lit system.
>>>>>>> 
>>>>>>> Are you planing to add tests that use this? If not we should
>>>>>>> really
>>>>>>> have at least an example/unit-test type thing in the
>>>>>>> repository.
>>>>>>> 
>>>>>>> - Matthias
>>>>>>> 
>>>>>>>> On Feb 25, 2016, at 3:06 AM, James Molloy via llvm-commits
>>>>>>>> <llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>> wrote:
>>>>>>>> 
>>>>>>>> Author: jamesm
>>>>>>>> Date: Thu Feb 25 05:06:15 2016
>>>>>>>> New Revision: 261857
>>>>>>>> 
>>>>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=261857&view=rev <http://llvm.org/viewvc/llvm-project?rev=261857&view=rev>
>>>>>>>> Log:
>>>>>>>> [cmake] Add support for arbitrary metrics
>>>>>>>> 
>>>>>>>> This allows a .test script to specify a command to get a
>>>>>>>> metric
>>>>>>>> for the test. For example:
>>>>>>>> 
>>>>>>>> METRIC: score: grep "Score:" %o | awk '{print $2}'
>>>>>>>> 
>>>>>>>> Modified:
>>>>>>>> test-suite/trunk/cmake/modules/SingleMultiSource.cmake
>>>>>>>> test-suite/trunk/litsupport/test.py
>>>>>>>> test-suite/trunk/litsupport/testscript.py
>>>>>>>> 
>>>>>>>> Modified:
>>>>>>>> test-suite/trunk/cmake/modules/SingleMultiSource.cmake
>>>>>>>> URL:
>>>>>>>> http://llvm.org/viewvc/llvm-project/test-suite/trunk/cmake/modules/SingleMultiSource.cmake?rev=261857&r1=261856&r2=261857&view=diff <http://llvm.org/viewvc/llvm-project/test-suite/trunk/cmake/modules/SingleMultiSource.cmake?rev=261857&r1=261856&r2=261857&view=diff>
>>>>>>>> ==============================================================================
>>>>>>>> --- test-suite/trunk/cmake/modules/SingleMultiSource.cmake
>>>>>>>> (original)
>>>>>>>> +++ test-suite/trunk/cmake/modules/SingleMultiSource.cmake Thu
>>>>>>>> Feb
>>>>>>>> 25 05:06:15 2016
>>>>>>>> @@ -223,3 +223,17 @@ macro(llvm_test_verify)
>>>>>>>>  set(TESTSCRIPT "${TESTSCRIPT}VERIFY: ${JOINED_ARGUMENTS}\n")
>>>>>>>> endif()
>>>>>>>> endmacro()
>>>>>>>> +
>>>>>>>> +macro(llvm_test_metric)
>>>>>>>> +  CMAKE_PARSE_ARGUMENTS(ARGS "" "RUN_TYPE;METRIC" "" ${ARGN})
>>>>>>>> +  if(NOT DEFINED TESTSCRIPT)
>>>>>>>> +    set(TESTSCRIPT "" PARENT_SCOPE)
>>>>>>>> +  endif()
>>>>>>>> +  # ARGS_UNPARSED_ARGUMENTS is a semicolon-separated list.
>>>>>>>> Change
>>>>>>>> it into a
>>>>>>>> +  # whitespace-separated string.
>>>>>>>> +  string(REPLACE ";" " " JOINED_ARGUMENTS
>>>>>>>> "${ARGS_UNPARSED_ARGUMENTS}")
>>>>>>>> +  if(NOT DEFINED ARGS_RUN_TYPE OR "${ARGS_RUN_TYPE}" STREQUAL
>>>>>>>> "${TEST_SUITE_RUN_TYPE}")
>>>>>>>> +    set(TESTSCRIPT "${TESTSCRIPT}METRIC: ${ARGS_METRIC}:
>>>>>>>> ${JOINED_ARGUMENTS}\n")
>>>>>>>> +  endif()
>>>>>>>> +endmacro()
>>>>>>>> +
>>>>>>>> \ No newline at end of file
>>>>>>>> 
>>>>>>>> Modified: test-suite/trunk/litsupport/test.py
>>>>>>>> URL:
>>>>>>>> http://llvm.org/viewvc/llvm-project/test-suite/trunk/litsupport/test.py?rev=261857&r1=261856&r2=261857&view=diff
>>>>>>>> ==============================================================================
>>>>>>>> --- test-suite/trunk/litsupport/test.py (original)
>>>>>>>> +++ test-suite/trunk/litsupport/test.py Thu Feb 25 05:06:15
>>>>>>>> 2016
>>>>>>>> @@ -56,7 +56,7 @@ class TestSuiteTest(FileBasedTest):
>>>>>>>>      res = testscript.parse(test.getSourcePath())
>>>>>>>>      if litConfig.noExecute:
>>>>>>>>          return lit.Test.Result(Test.PASS)
>>>>>>>> -        runscript, verifyscript = res
>>>>>>>> +        runscript, verifyscript, metricscripts = res
>>>>>>>> 
>>>>>>>>      # Apply the usual lit substitutions (%s, %S, %p, %T,
>>>>>>>>      ...)
>>>>>>>>      tmpDir, tmpBase = getTempPaths(test)
>>>>>>>> @@ -65,6 +65,8 @@ class TestSuiteTest(FileBasedTest):
>>>>>>>>      substitutions += [('%o', outfile)]
>>>>>>>>      runscript = applySubstitutions(runscript, substitutions)
>>>>>>>>      verifyscript = applySubstitutions(verifyscript,
>>>>>>>>      substitutions)
>>>>>>>> +        metricscripts = {k: applySubstitutions(v,
>>>>>>>> substitutions)
>>>>>>>> +                         for k,v in metricscripts.items()}
>>>>>>>>      context = TestContext(test, litConfig, runscript,
>>>>>>>>      verifyscript, tmpDir,
>>>>>>>>                            tmpBase)
>>>>>>>> 
>>>>>>>> @@ -80,6 +82,7 @@ class TestSuiteTest(FileBasedTest):
>>>>>>>>      output = ""
>>>>>>>>      n_runs = 1
>>>>>>>>      runtimes = []
>>>>>>>> +        metrics = {}
>>>>>>>>      for n in range(n_runs):
>>>>>>>>          res = runScript(context, runscript)
>>>>>>>>          if isinstance(res, lit.Test.Result):
>>>>>>>> @@ -94,6 +97,15 @@ class TestSuiteTest(FileBasedTest):
>>>>>>>>              output += "\n" + err
>>>>>>>>              return lit.Test.Result(Test.FAIL, output)
>>>>>>>> 
>>>>>>>> +            # Execute metric extraction scripts.
>>>>>>>> +            for metric, script in metricscripts.items():
>>>>>>>> +                res = runScript(context, script)
>>>>>>>> +                if isinstance(res, lit.Test.Result):
>>>>>>>> +                    return res
>>>>>>>> +
>>>>>>>> +                out, err, exitCode, timeoutInfo = res
>>>>>>>> +                metrics.setdefault(metric,
>>>>>>>> list()).append(float(out))
>>>>>>>> +
>>>>>>>>          try:
>>>>>>>>              runtime = runsafely.getTime(context)
>>>>>>>>              runtimes.append(runtime)
>>>>>>>> @@ -128,6 +140,8 @@ class TestSuiteTest(FileBasedTest):
>>>>>>>>      result = lit.Test.Result(Test.PASS, output)
>>>>>>>>      if len(runtimes) > 0:
>>>>>>>>          result.addMetric('exec_time',
>>>>>>>>          lit.Test.toMetricValue(runtimes[0]))
>>>>>>>> +        for metric, values in metrics.items():
>>>>>>>> +            result.addMetric(metric,
>>>>>>>> lit.Test.toMetricValue(values[0]))
>>>>>>>>      compiletime.collect(context, result)
>>>>>>>> 
>>>>>>>>      return result
>>>>>>>> 
>>>>>>>> Modified: test-suite/trunk/litsupport/testscript.py
>>>>>>>> URL:
>>>>>>>> http://llvm.org/viewvc/llvm-project/test-suite/trunk/litsupport/testscript.py?rev=261857&r1=261856&r2=261857&view=diff
>>>>>>>> ==============================================================================
>>>>>>>> --- test-suite/trunk/litsupport/testscript.py (original)
>>>>>>>> +++ test-suite/trunk/litsupport/testscript.py Thu Feb 25
>>>>>>>> 05:06:15
>>>>>>>> 2016
>>>>>>>> @@ -22,13 +22,18 @@ def parse(filename):
>>>>>>>>  # Collect the test lines from the script.
>>>>>>>>  runscript = []
>>>>>>>>  verifyscript = []
>>>>>>>> -    keywords = ['RUN:', 'VERIFY:']
>>>>>>>> +    metricscripts = {}
>>>>>>>> +    keywords = ['RUN:', 'VERIFY:', 'METRIC:']
>>>>>>>>  for line_number, command_type, ln in \
>>>>>>>>          parseIntegratedTestScriptCommands(filename,
>>>>>>>>          keywords):
>>>>>>>>      if command_type == 'RUN':
>>>>>>>>          _parseShellCommand(runscript, ln)
>>>>>>>>      elif command_type == 'VERIFY':
>>>>>>>>          _parseShellCommand(verifyscript, ln)
>>>>>>>> +        elif command_type == 'METRIC':
>>>>>>>> +            metric, ln = ln.split(':', 1)
>>>>>>>> +            metricscript =
>>>>>>>> metricscripts.setdefault(metric.strip(), list())
>>>>>>>> +            _parseShellCommand(metricscript, ln)
>>>>>>>>      else:
>>>>>>>>          raise ValueError("unknown script command type: %r" %
>>>>>>>>          (
>>>>>>>>                           command_type,))
>>>>>>>> @@ -43,4 +48,4 @@ def parse(filename):
>>>>>>>>          raise ValueError("Test has unterminated RUN/VERIFY
>>>>>>>>          lines " +
>>>>>>>>                           "(ending with '\\')")
>>>>>>>> 
>>>>>>>> -    return runscript, verifyscript
>>>>>>>> +    return runscript, verifyscript, metricscripts
>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> llvm-commits mailing list
>>>>>>>> llvm-commits at lists.llvm.org
>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>>> 
>>>>>> _______________________________________________
>>>>>> llvm-commits mailing list
>>>>>> llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>> 
>>>>> --
>>>>> Hal Finkel
>>>>> Assistant Computational Scientist
>>>>> Leadership Computing Facility
>>>>> Argonne National Laboratory
>>>> 
>>> 
>>> --
>>> Hal Finkel
>>> Assistant Computational Scientist
>>> Leadership Computing Facility
>>> Argonne National Laboratory
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
>>> 
>> 
>> -- 
>> Hal Finkel
>> Assistant Computational Scientist
>> Leadership Computing Facility
>> Argonne National Laboratory
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160323/5990899a/attachment.html>