[llvm-dev] RFC: LNT/Test-suite support for custom metrics and test parameterization
Kristof Beyls via llvm-dev
llvm-dev at lists.llvm.org
Fri Apr 22 00:45:03 PDT 2016
On 21 Apr 2016, at 17:44, Sergey Yakoushkin <sergey.yakoushkin at gmail.com<mailto:sergey.yakoushkin at gmail.com>> wrote:
The way we use LNT, we would run different configuration (e.g. -O3 vs -Os) as different "machines" in LNT's model.
O2/O3 is indeed bad example. We're also using different machines for Os/O3 - such parameters apply to all tests and we don't propose major changes.
Elena was only extending LNT interface a bit to ease LLVM-testsuite execution with different compiler or HW flags.
Oh I see, this boils down to extending the lnt runtest interface to be able to specify a set of configurations, rather than a single configuration and making
sure configurations get submitted under different machine names? We kick off the different configuration runs through a script invoking lnt runtest multiple
times. I don't see a big problem with extending the lnt runtest interface to do this, assuming it doesn't break the underlying concepts assumed throughout
LNT. Maybe the only downside is that this will add even more command line options to lnt runtest, which already has a lot (too many?) command line
Maybe some changes are required to analyze and compare metrics between "machines": e.g. code size/performance between Os/O2/O3.
Do you perform such comparisons?
We typically do these kinds of comparisons when we test our patches pre-commit, i.e. comparing for example '-O3' with '-O3 'mllvm -enable-my-new-pass'.
To stick with the LNT concepts, tests enabling new passes are stored as a different "machine".
The only way I know to be able to do a comparison between runs on 2 different "machine"s is to manually edit the URL for run vs run comparison
and fill in the runids of the 2 runs you want to compare.
For example, the following URL is a comparison of green-dragon-07-x86_64-O3-flto vs green-dragon-06-x86_64-O0-g on the public llvm.org/perf<http://llvm.org/perf> server:
I had to manually look up and fill in the run ids 70644 and 70634.
It would be great if there was a better way to be able to do these kind of comparisons - i.e. not having to manually fill in run ids, but having a webui to easily find and pick the runs you want to compare.
(As an aside: I find it intriguing that the URL above suggests that there are quite a few cases where "-O0 -g" produces faster code than "-O3 -flto").
"test parameters" are different, they allow exploring multiple variants of the same test case. E.g. can be:
* index of input data sets, length of input vector, size of matrix, etc;
* macro that affect source code such as changing 1) static data allocation to dynamic or 2) constants to variables (compile-time unknown)
* extra sets of internal compilation options that are relevant only for particular test case
Same parameters can apply to multiple tests with different value sets:
Of course, original test cases can be duplicated (copied under different names) - that is enough to execute tests.
Explicit "test parameters" allow exploring dependencies between test parameters and metrics.
Right. In the new cmake+lit way of driving the test-suite, some of these test parameters are input to cmake (like macros) and others will be input to lit (like changing inputs), I think.
We see this also in e.g. running SPEC with ref vs train vs test data sets. TBH, I'm not quite sure how to best drive this. I guess Mathhias may have better ideas than me here.
I do think that to comply with LNT's current conceptual model, tests being run with different parameters will have to have different test names in the LNT view.
On Thu, Apr 21, 2016 at 4:36 PM, Kristof Beyls via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
On 21 Apr 2016, at 15:00, Elena Lepilkina <Elena.Lepilkina at synopsys.com<mailto:Elena.Lepilkina at synopsys.com>> wrote:
Hi Kristof and Daniel,
Thanks for your answers.
Unfortunately I haven’t tried scaling up to a large data set before. Today I tried and results are quite bad.
So database scheme should be rebuild. Now I thought about creating one sample table for each test-suite, but not cloning all tables in packet. As I see other tables can be same for all testsuites. I mean if user run tests of new testsuite, new sample table would be created during importing data from json, if it doesn’t exist. Are there some problems with this solution? May be, I don’t know some details.
It's unfortunate to see performance doesn't scale with the proposed initial schema, but not entirely surprising. I don't really have much feedback on how the schema could be adapted otherwise as I haven't worked much on that. I hope Daniel will have more insights to share here.
Moreover, I have question about compile tests. Are compile tests runnable? In http://llvm.org/perf there is no compile test. Does that mean that they are deprecated for now?
About test parameters, for example, we would like to have opportunity to compare benchmark results of test compiled with -O3 and -Os in context of one run.
The way we use LNT, we would run different configuration (e.g. -O3 vs -Os) as different "machines" in LNT's model. This is also explained in LNT's documentation, see
https://github.com/llvm-mirror/lnt/blob/master/docs/concepts.rst. Unfortunately, this version of the documentation hasn't found it's way yet to http://llvm.org/docs/lnt/contents.html.
Is there a reason why storing different configurations as different "machines" in the LNT model doesn't work for you?
I assume that there are a number of places in LNT's analyses that assume that different runs coming from the same "machine" are always produced by the same configuration. But I'm not entirely sure about that.
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev