[test-suite] r261857 - [cmake] Add support for arbitrary metrics

Wed Mar 23 15:19:37 PDT 2016

----- Original Message -----
> From: "Matthias Braun" <mbraun at apple.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "James Molloy" <James.Molloy at arm.com>, "nd" <nd at arm.com>, "llvm-commits" <llvm-commits at lists.llvm.org>
> Sent: Friday, March 4, 2016 12:36:36 PM
> Subject: Re: [test-suite] r261857 - [cmake] Add support for arbitrary metrics
> 
> A test can report "internal" metrics now. Though I don't think lnt
> would split those into a notion of sub-tests I think.
> It would be an interesting feature to add. Though if we have the
> choice to modify a benchmark, we should still prefer smaller
> independent ones IMO as that gives a better idea when some of the
> other metrics change (compiletime, codesize, hopefully things like
> memory usage or performance counters in the future).

Unless the kernels are large, their code size within the context of a complete executable might be hard to track regardless (because by the time you add in the static libc startup code, ELF headers, etc. any change would be a smaller percentage of the total). Explicitly instrumenting the code to mark regions of interest is probably best (which is true for timing too), but that seems like a separate (although worthwhile) project.

In any case, for TSVC, for example, the single test has 136 kernels; which I currently group into 18 binaries. I have a float and double version for each, so we have 36 total binaries. What you're suggesting would have us produce 272 separate executables, just for TSVC. Ideally, I'd like aligned and unaligned variants of each of these. I've not done that because I thought that 72 executables would be a bit much, but that's 544 executables if I generate one per kernel variant.

The LCALS benchmark, which I'd really like to add sometime soon, has another ~100 kernels, which is ~200 to do both float and double (which we should do).

What do you think is reasonable here?

Thanks again,
Hal

> 
> - Matthias
> 
> > On Mar 4, 2016, at 8:22 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> > 
> > Hi James,
> > 
> > If I'm reading this correctly, you can have multiple metrics per
> > test. Is that correct?
> > 
> > I'd really like to support tests with internal timers (i.e. a timer
> > per kernel), so that we can have more fine-grained timing without
> > splitting executables into multiple parts (e.g. as I had to do
> > with TSVC).
> > 
> > Thanks again,
> > Hal
> > 
> > ----- Original Message -----
> >> From: "James Molloy via llvm-commits"
> >> <llvm-commits at lists.llvm.org>
> >> To: "Matthias Braun" <mbraun at apple.com>
> >> Cc: "nd" <nd at arm.com>, "llvm-commits"
> >> <llvm-commits at lists.llvm.org>
> >> Sent: Friday, February 26, 2016 3:06:01 AM
> >> Subject: Re: [test-suite] r261857 - [cmake] Add support for
> >> arbitrary metrics
> >> 
> >> Hi Matthias,
> >> 
> >> Thanks :) I’ve been working internally to move all our testing
> >> from
> >> ad-hoc driver scripts to CMake+LIT-based. Currently I have CMake
> >> drivers for a very popular mobile benchmark (but the pre-release
> >> version so pushing this upstream might be difficult), and EEMBC
> >> (automotive, telecom, consumer).
> >> 
> >> I really want all of these to live upstream, but I have to do a
> >> bit
> >> of legal checking before I can push them. In the meantime I’m
> >> happy
> >> to add an example to the repositories; alternatively I could
> >> modify
> >> the SPEC drivers to also compute SPECrate as a metric?
> >> 
> >> Cheers,
> >> 
> >> James
> >> 
> >>> On 25 Feb 2016, at 21:33, Matthias Braun <mbraun at apple.com>
> >>> wrote:
> >>> 
> >>> Hi James,
> >>> 
> >>> thanks for working on the test-suite. It's nice to see new
> >>> capabilities added to the lit system.
> >>> 
> >>> Are you planing to add tests that use this? If not we should
> >>> really
> >>> have at least an example/unit-test type thing in the repository.
> >>> 
> >>> - Matthias
> >>> 
> >>>> On Feb 25, 2016, at 3:06 AM, James Molloy via llvm-commits
> >>>> <llvm-commits at lists.llvm.org> wrote:
> >>>> 
> >>>> Author: jamesm
> >>>> Date: Thu Feb 25 05:06:15 2016
> >>>> New Revision: 261857
> >>>> 
> >>>> URL: http://llvm.org/viewvc/llvm-project?rev=261857&view=rev
> >>>> Log:
> >>>> [cmake] Add support for arbitrary metrics
> >>>> 
> >>>> This allows a .test script to specify a command to get a metric
> >>>> for the test. For example:
> >>>> 
> >>>> METRIC: score: grep "Score:" %o | awk '{print $2}'
> >>>> 
> >>>> Modified:
> >>>>  test-suite/trunk/cmake/modules/SingleMultiSource.cmake
> >>>>  test-suite/trunk/litsupport/test.py
> >>>>  test-suite/trunk/litsupport/testscript.py
> >>>> 
> >>>> Modified: test-suite/trunk/cmake/modules/SingleMultiSource.cmake
> >>>> URL:
> >>>> http://llvm.org/viewvc/llvm-project/test-suite/trunk/cmake/modules/SingleMultiSource.cmake?rev=261857&r1=261856&r2=261857&view=diff
> >>>> ==============================================================================
> >>>> --- test-suite/trunk/cmake/modules/SingleMultiSource.cmake
> >>>> (original)
> >>>> +++ test-suite/trunk/cmake/modules/SingleMultiSource.cmake Thu
> >>>> Feb
> >>>> 25 05:06:15 2016
> >>>> @@ -223,3 +223,17 @@ macro(llvm_test_verify)
> >>>>   set(TESTSCRIPT "${TESTSCRIPT}VERIFY: ${JOINED_ARGUMENTS}\n")
> >>>> endif()
> >>>> endmacro()
> >>>> +
> >>>> +macro(llvm_test_metric)
> >>>> +  CMAKE_PARSE_ARGUMENTS(ARGS "" "RUN_TYPE;METRIC" "" ${ARGN})
> >>>> +  if(NOT DEFINED TESTSCRIPT)
> >>>> +    set(TESTSCRIPT "" PARENT_SCOPE)
> >>>> +  endif()
> >>>> +  # ARGS_UNPARSED_ARGUMENTS is a semicolon-separated list.
> >>>> Change
> >>>> it into a
> >>>> +  # whitespace-separated string.
> >>>> +  string(REPLACE ";" " " JOINED_ARGUMENTS
> >>>> "${ARGS_UNPARSED_ARGUMENTS}")
> >>>> +  if(NOT DEFINED ARGS_RUN_TYPE OR "${ARGS_RUN_TYPE}" STREQUAL
> >>>> "${TEST_SUITE_RUN_TYPE}")
> >>>> +    set(TESTSCRIPT "${TESTSCRIPT}METRIC: ${ARGS_METRIC}:
> >>>> ${JOINED_ARGUMENTS}\n")
> >>>> +  endif()
> >>>> +endmacro()
> >>>> +
> >>>> \ No newline at end of file
> >>>> 
> >>>> Modified: test-suite/trunk/litsupport/test.py
> >>>> URL:
> >>>> http://llvm.org/viewvc/llvm-project/test-suite/trunk/litsupport/test.py?rev=261857&r1=261856&r2=261857&view=diff
> >>>> ==============================================================================
> >>>> --- test-suite/trunk/litsupport/test.py (original)
> >>>> +++ test-suite/trunk/litsupport/test.py Thu Feb 25 05:06:15 2016
> >>>> @@ -56,7 +56,7 @@ class TestSuiteTest(FileBasedTest):
> >>>>       res = testscript.parse(test.getSourcePath())
> >>>>       if litConfig.noExecute:
> >>>>           return lit.Test.Result(Test.PASS)
> >>>> -        runscript, verifyscript = res
> >>>> +        runscript, verifyscript, metricscripts = res
> >>>> 
> >>>>       # Apply the usual lit substitutions (%s, %S, %p, %T, ...)
> >>>>       tmpDir, tmpBase = getTempPaths(test)
> >>>> @@ -65,6 +65,8 @@ class TestSuiteTest(FileBasedTest):
> >>>>       substitutions += [('%o', outfile)]
> >>>>       runscript = applySubstitutions(runscript, substitutions)
> >>>>       verifyscript = applySubstitutions(verifyscript,
> >>>>       substitutions)
> >>>> +        metricscripts = {k: applySubstitutions(v,
> >>>> substitutions)
> >>>> +                         for k,v in metricscripts.items()}
> >>>>       context = TestContext(test, litConfig, runscript,
> >>>>       verifyscript, tmpDir,
> >>>>                             tmpBase)
> >>>> 
> >>>> @@ -80,6 +82,7 @@ class TestSuiteTest(FileBasedTest):
> >>>>       output = ""
> >>>>       n_runs = 1
> >>>>       runtimes = []
> >>>> +        metrics = {}
> >>>>       for n in range(n_runs):
> >>>>           res = runScript(context, runscript)
> >>>>           if isinstance(res, lit.Test.Result):
> >>>> @@ -94,6 +97,15 @@ class TestSuiteTest(FileBasedTest):
> >>>>               output += "\n" + err
> >>>>               return lit.Test.Result(Test.FAIL, output)
> >>>> 
> >>>> +            # Execute metric extraction scripts.
> >>>> +            for metric, script in metricscripts.items():
> >>>> +                res = runScript(context, script)
> >>>> +                if isinstance(res, lit.Test.Result):
> >>>> +                    return res
> >>>> +
> >>>> +                out, err, exitCode, timeoutInfo = res
> >>>> +                metrics.setdefault(metric,
> >>>> list()).append(float(out))
> >>>> +
> >>>>           try:
> >>>>               runtime = runsafely.getTime(context)
> >>>>               runtimes.append(runtime)
> >>>> @@ -128,6 +140,8 @@ class TestSuiteTest(FileBasedTest):
> >>>>       result = lit.Test.Result(Test.PASS, output)
> >>>>       if len(runtimes) > 0:
> >>>>           result.addMetric('exec_time',
> >>>>           lit.Test.toMetricValue(runtimes[0]))
> >>>> +        for metric, values in metrics.items():
> >>>> +            result.addMetric(metric,
> >>>> lit.Test.toMetricValue(values[0]))
> >>>>       compiletime.collect(context, result)
> >>>> 
> >>>>       return result
> >>>> 
> >>>> Modified: test-suite/trunk/litsupport/testscript.py
> >>>> URL:
> >>>> http://llvm.org/viewvc/llvm-project/test-suite/trunk/litsupport/testscript.py?rev=261857&r1=261856&r2=261857&view=diff
> >>>> ==============================================================================
> >>>> --- test-suite/trunk/litsupport/testscript.py (original)
> >>>> +++ test-suite/trunk/litsupport/testscript.py Thu Feb 25
> >>>> 05:06:15
> >>>> 2016
> >>>> @@ -22,13 +22,18 @@ def parse(filename):
> >>>>   # Collect the test lines from the script.
> >>>>   runscript = []
> >>>>   verifyscript = []
> >>>> -    keywords = ['RUN:', 'VERIFY:']
> >>>> +    metricscripts = {}
> >>>> +    keywords = ['RUN:', 'VERIFY:', 'METRIC:']
> >>>>   for line_number, command_type, ln in \
> >>>>           parseIntegratedTestScriptCommands(filename, keywords):
> >>>>       if command_type == 'RUN':
> >>>>           _parseShellCommand(runscript, ln)
> >>>>       elif command_type == 'VERIFY':
> >>>>           _parseShellCommand(verifyscript, ln)
> >>>> +        elif command_type == 'METRIC':
> >>>> +            metric, ln = ln.split(':', 1)
> >>>> +            metricscript =
> >>>> metricscripts.setdefault(metric.strip(), list())
> >>>> +            _parseShellCommand(metricscript, ln)
> >>>>       else:
> >>>>           raise ValueError("unknown script command type: %r" % (
> >>>>                            command_type,))
> >>>> @@ -43,4 +48,4 @@ def parse(filename):
> >>>>           raise ValueError("Test has unterminated RUN/VERIFY
> >>>>           lines " +
> >>>>                            "(ending with '\\')")
> >>>> 
> >>>> -    return runscript, verifyscript
> >>>> +    return runscript, verifyscript, metricscripts
> >>>> 
> >>>> 
> >>>> _______________________________________________
> >>>> llvm-commits mailing list
> >>>> llvm-commits at lists.llvm.org
> >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
> >> 
> >> _______________________________________________
> >> llvm-commits mailing list
> >> llvm-commits at lists.llvm.org
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
> > 
> > --
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory