[test-suite] r261857 - [cmake] Add support for arbitrary metrics

Wed Mar 23 15:45:11 PDT 2016

----- Original Message -----
> From: "Hal Finkel via llvm-commits" <llvm-commits at lists.llvm.org>
> To: "Matthias Braun" <mbraun at apple.com>
> Cc: "nd" <nd at arm.com>, "llvm-commits" <llvm-commits at lists.llvm.org>
> Sent: Wednesday, March 23, 2016 5:19:37 PM
> Subject: Re: [test-suite] r261857 - [cmake] Add support for arbitrary metrics
> 
> ----- Original Message -----
> > From: "Matthias Braun" <mbraun at apple.com>
> > To: "Hal Finkel" <hfinkel at anl.gov>
> > Cc: "James Molloy" <James.Molloy at arm.com>, "nd" <nd at arm.com>,
> > "llvm-commits" <llvm-commits at lists.llvm.org>
> > Sent: Friday, March 4, 2016 12:36:36 PM
> > Subject: Re: [test-suite] r261857 - [cmake] Add support for
> > arbitrary metrics
> > 
> > A test can report "internal" metrics now. Though I don't think lnt
> > would split those into a notion of sub-tests I think.
> > It would be an interesting feature to add. Though if we have the
> > choice to modify a benchmark, we should still prefer smaller
> > independent ones IMO as that gives a better idea when some of the
> > other metrics change (compiletime, codesize, hopefully things like
> > memory usage or performance counters in the future).
> 
> Unless the kernels are large, their code size within the context of a
> complete executable might be hard to track regardless (because by
> the time you add in the static libc startup code, ELF headers, etc.
> any change would be a smaller percentage of the total). Explicitly
> instrumenting the code to mark regions of interest is probably best
> (which is true for timing too), but that seems like a separate
> (although worthwhile) project.
> 
> In any case, for TSVC, for example, the single test has 136 kernels;
> which I currently group into 18 binaries. I have a float and double
> version for each, so we have 36 total binaries. What you're
> suggesting would have us produce 272 separate executables, just for
> TSVC. Ideally, I'd like aligned and unaligned variants of each of
> these. I've not done that because I thought that 72 executables
> would be a bit much, but that's 544 executables if I generate one
> per kernel variant.
> 
> The LCALS benchmark, which I'd really like to add sometime soon, has
> another ~100 kernels, which is ~200 to do both float and double
> (which we should do).
> 
> What do you think is reasonable here?
> 

Also, we might want to consider updating some of these tests to use Google's benchmark library (https://github.com/google/benchmark). Have you looked at this? Aside from giving us a common output format, the real advantage of using a driver library like this is that it lets us dynamically pick the number of loop iterations based on per-iteration timing. This appeals to me because the number of iterations that is reasonable for some embedded device is normally quite different from what is reasonable for a server-class machine. Doing this, however, means that we definitely can't rely on overall executable timing. Thoughts?

 -Hal

> Thanks again,
> Hal
> 
> > 
> > - Matthias
> > 
> > > On Mar 4, 2016, at 8:22 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> > > 
> > > Hi James,
> > > 
> > > If I'm reading this correctly, you can have multiple metrics per
> > > test. Is that correct?
> > > 
> > > I'd really like to support tests with internal timers (i.e. a
> > > timer
> > > per kernel), so that we can have more fine-grained timing without
> > > splitting executables into multiple parts (e.g. as I had to do
> > > with TSVC).
> > > 
> > > Thanks again,
> > > Hal
> > > 
> > > ----- Original Message -----
> > >> From: "James Molloy via llvm-commits"
> > >> <llvm-commits at lists.llvm.org>
> > >> To: "Matthias Braun" <mbraun at apple.com>
> > >> Cc: "nd" <nd at arm.com>, "llvm-commits"
> > >> <llvm-commits at lists.llvm.org>
> > >> Sent: Friday, February 26, 2016 3:06:01 AM
> > >> Subject: Re: [test-suite] r261857 - [cmake] Add support for
> > >> arbitrary metrics
> > >> 
> > >> Hi Matthias,
> > >> 
> > >> Thanks :) I’ve been working internally to move all our testing
> > >> from
> > >> ad-hoc driver scripts to CMake+LIT-based. Currently I have CMake
> > >> drivers for a very popular mobile benchmark (but the pre-release
> > >> version so pushing this upstream might be difficult), and EEMBC
> > >> (automotive, telecom, consumer).
> > >> 
> > >> I really want all of these to live upstream, but I have to do a
> > >> bit
> > >> of legal checking before I can push them. In the meantime I’m
> > >> happy
> > >> to add an example to the repositories; alternatively I could
> > >> modify
> > >> the SPEC drivers to also compute SPECrate as a metric?
> > >> 
> > >> Cheers,
> > >> 
> > >> James
> > >> 
> > >>> On 25 Feb 2016, at 21:33, Matthias Braun <mbraun at apple.com>
> > >>> wrote:
> > >>> 
> > >>> Hi James,
> > >>> 
> > >>> thanks for working on the test-suite. It's nice to see new
> > >>> capabilities added to the lit system.
> > >>> 
> > >>> Are you planing to add tests that use this? If not we should
> > >>> really
> > >>> have at least an example/unit-test type thing in the
> > >>> repository.
> > >>> 
> > >>> - Matthias
> > >>> 
> > >>>> On Feb 25, 2016, at 3:06 AM, James Molloy via llvm-commits
> > >>>> <llvm-commits at lists.llvm.org> wrote:
> > >>>> 
> > >>>> Author: jamesm
> > >>>> Date: Thu Feb 25 05:06:15 2016
> > >>>> New Revision: 261857
> > >>>> 
> > >>>> URL: http://llvm.org/viewvc/llvm-project?rev=261857&view=rev
> > >>>> Log:
> > >>>> [cmake] Add support for arbitrary metrics
> > >>>> 
> > >>>> This allows a .test script to specify a command to get a
> > >>>> metric
> > >>>> for the test. For example:
> > >>>> 
> > >>>> METRIC: score: grep "Score:" %o | awk '{print $2}'
> > >>>> 
> > >>>> Modified:
> > >>>>  test-suite/trunk/cmake/modules/SingleMultiSource.cmake
> > >>>>  test-suite/trunk/litsupport/test.py
> > >>>>  test-suite/trunk/litsupport/testscript.py
> > >>>> 
> > >>>> Modified:
> > >>>> test-suite/trunk/cmake/modules/SingleMultiSource.cmake
> > >>>> URL:
> > >>>> http://llvm.org/viewvc/llvm-project/test-suite/trunk/cmake/modules/SingleMultiSource.cmake?rev=261857&r1=261856&r2=261857&view=diff
> > >>>> ==============================================================================
> > >>>> --- test-suite/trunk/cmake/modules/SingleMultiSource.cmake
> > >>>> (original)
> > >>>> +++ test-suite/trunk/cmake/modules/SingleMultiSource.cmake Thu
> > >>>> Feb
> > >>>> 25 05:06:15 2016
> > >>>> @@ -223,3 +223,17 @@ macro(llvm_test_verify)
> > >>>>   set(TESTSCRIPT "${TESTSCRIPT}VERIFY: ${JOINED_ARGUMENTS}\n")
> > >>>> endif()
> > >>>> endmacro()
> > >>>> +
> > >>>> +macro(llvm_test_metric)
> > >>>> +  CMAKE_PARSE_ARGUMENTS(ARGS "" "RUN_TYPE;METRIC" "" ${ARGN})
> > >>>> +  if(NOT DEFINED TESTSCRIPT)
> > >>>> +    set(TESTSCRIPT "" PARENT_SCOPE)
> > >>>> +  endif()
> > >>>> +  # ARGS_UNPARSED_ARGUMENTS is a semicolon-separated list.
> > >>>> Change
> > >>>> it into a
> > >>>> +  # whitespace-separated string.
> > >>>> +  string(REPLACE ";" " " JOINED_ARGUMENTS
> > >>>> "${ARGS_UNPARSED_ARGUMENTS}")
> > >>>> +  if(NOT DEFINED ARGS_RUN_TYPE OR "${ARGS_RUN_TYPE}" STREQUAL
> > >>>> "${TEST_SUITE_RUN_TYPE}")
> > >>>> +    set(TESTSCRIPT "${TESTSCRIPT}METRIC: ${ARGS_METRIC}:
> > >>>> ${JOINED_ARGUMENTS}\n")
> > >>>> +  endif()
> > >>>> +endmacro()
> > >>>> +
> > >>>> \ No newline at end of file
> > >>>> 
> > >>>> Modified: test-suite/trunk/litsupport/test.py
> > >>>> URL:
> > >>>> http://llvm.org/viewvc/llvm-project/test-suite/trunk/litsupport/test.py?rev=261857&r1=261856&r2=261857&view=diff
> > >>>> ==============================================================================
> > >>>> --- test-suite/trunk/litsupport/test.py (original)
> > >>>> +++ test-suite/trunk/litsupport/test.py Thu Feb 25 05:06:15
> > >>>> 2016
> > >>>> @@ -56,7 +56,7 @@ class TestSuiteTest(FileBasedTest):
> > >>>>       res = testscript.parse(test.getSourcePath())
> > >>>>       if litConfig.noExecute:
> > >>>>           return lit.Test.Result(Test.PASS)
> > >>>> -        runscript, verifyscript = res
> > >>>> +        runscript, verifyscript, metricscripts = res
> > >>>> 
> > >>>>       # Apply the usual lit substitutions (%s, %S, %p, %T,
> > >>>>       ...)
> > >>>>       tmpDir, tmpBase = getTempPaths(test)
> > >>>> @@ -65,6 +65,8 @@ class TestSuiteTest(FileBasedTest):
> > >>>>       substitutions += [('%o', outfile)]
> > >>>>       runscript = applySubstitutions(runscript, substitutions)
> > >>>>       verifyscript = applySubstitutions(verifyscript,
> > >>>>       substitutions)
> > >>>> +        metricscripts = {k: applySubstitutions(v,
> > >>>> substitutions)
> > >>>> +                         for k,v in metricscripts.items()}
> > >>>>       context = TestContext(test, litConfig, runscript,
> > >>>>       verifyscript, tmpDir,
> > >>>>                             tmpBase)
> > >>>> 
> > >>>> @@ -80,6 +82,7 @@ class TestSuiteTest(FileBasedTest):
> > >>>>       output = ""
> > >>>>       n_runs = 1
> > >>>>       runtimes = []
> > >>>> +        metrics = {}
> > >>>>       for n in range(n_runs):
> > >>>>           res = runScript(context, runscript)
> > >>>>           if isinstance(res, lit.Test.Result):
> > >>>> @@ -94,6 +97,15 @@ class TestSuiteTest(FileBasedTest):
> > >>>>               output += "\n" + err
> > >>>>               return lit.Test.Result(Test.FAIL, output)
> > >>>> 
> > >>>> +            # Execute metric extraction scripts.
> > >>>> +            for metric, script in metricscripts.items():
> > >>>> +                res = runScript(context, script)
> > >>>> +                if isinstance(res, lit.Test.Result):
> > >>>> +                    return res
> > >>>> +
> > >>>> +                out, err, exitCode, timeoutInfo = res
> > >>>> +                metrics.setdefault(metric,
> > >>>> list()).append(float(out))
> > >>>> +
> > >>>>           try:
> > >>>>               runtime = runsafely.getTime(context)
> > >>>>               runtimes.append(runtime)
> > >>>> @@ -128,6 +140,8 @@ class TestSuiteTest(FileBasedTest):
> > >>>>       result = lit.Test.Result(Test.PASS, output)
> > >>>>       if len(runtimes) > 0:
> > >>>>           result.addMetric('exec_time',
> > >>>>           lit.Test.toMetricValue(runtimes[0]))
> > >>>> +        for metric, values in metrics.items():
> > >>>> +            result.addMetric(metric,
> > >>>> lit.Test.toMetricValue(values[0]))
> > >>>>       compiletime.collect(context, result)
> > >>>> 
> > >>>>       return result
> > >>>> 
> > >>>> Modified: test-suite/trunk/litsupport/testscript.py
> > >>>> URL:
> > >>>> http://llvm.org/viewvc/llvm-project/test-suite/trunk/litsupport/testscript.py?rev=261857&r1=261856&r2=261857&view=diff
> > >>>> ==============================================================================
> > >>>> --- test-suite/trunk/litsupport/testscript.py (original)
> > >>>> +++ test-suite/trunk/litsupport/testscript.py Thu Feb 25
> > >>>> 05:06:15
> > >>>> 2016
> > >>>> @@ -22,13 +22,18 @@ def parse(filename):
> > >>>>   # Collect the test lines from the script.
> > >>>>   runscript = []
> > >>>>   verifyscript = []
> > >>>> -    keywords = ['RUN:', 'VERIFY:']
> > >>>> +    metricscripts = {}
> > >>>> +    keywords = ['RUN:', 'VERIFY:', 'METRIC:']
> > >>>>   for line_number, command_type, ln in \
> > >>>>           parseIntegratedTestScriptCommands(filename,
> > >>>>           keywords):
> > >>>>       if command_type == 'RUN':
> > >>>>           _parseShellCommand(runscript, ln)
> > >>>>       elif command_type == 'VERIFY':
> > >>>>           _parseShellCommand(verifyscript, ln)
> > >>>> +        elif command_type == 'METRIC':
> > >>>> +            metric, ln = ln.split(':', 1)
> > >>>> +            metricscript =
> > >>>> metricscripts.setdefault(metric.strip(), list())
> > >>>> +            _parseShellCommand(metricscript, ln)
> > >>>>       else:
> > >>>>           raise ValueError("unknown script command type: %r" %
> > >>>>           (
> > >>>>                            command_type,))
> > >>>> @@ -43,4 +48,4 @@ def parse(filename):
> > >>>>           raise ValueError("Test has unterminated RUN/VERIFY
> > >>>>           lines " +
> > >>>>                            "(ending with '\\')")
> > >>>> 
> > >>>> -    return runscript, verifyscript
> > >>>> +    return runscript, verifyscript, metricscripts
> > >>>> 
> > >>>> 
> > >>>> _______________________________________________
> > >>>> llvm-commits mailing list
> > >>>> llvm-commits at lists.llvm.org
> > >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
> > >> 
> > >> _______________________________________________
> > >> llvm-commits mailing list
> > >> llvm-commits at lists.llvm.org
> > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
> > > 
> > > --
> > > Hal Finkel
> > > Assistant Computational Scientist
> > > Leadership Computing Facility
> > > Argonne National Laboratory
> > 
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory