[PATCH] D29030: [LNT] Add cross-compilation support to 'lnt runtest test-suite'

Thu Jan 26 00:42:12 PST 2017

On 25 Jan 2017, at 22:39, Matthias Braun via llvm-commits <llvm-commits at lists.llvm.org<mailto:llvm-commits at lists.llvm.org>> wrote:

On Jan 25, 2017, at 11:39 AM, Chris Matthews via llvm-commits <llvm-commits at lists.llvm.org<mailto:llvm-commits at lists.llvm.org>> wrote:

There are a lot more things in there than the metadata detection. Off the top of my head:

 - xml test reports for the CI systems, and CSV reports
Just out of interested, what are the CSV reports used for?

I use them from time to time to do ad-hoc analysis on test-suite results, often by analyzing them in a spreadsheet.
The LNT webui currently doesn't make it very easy to do ad-hoc analyses.
The CSV tends to be more straightforward to import into other pieces of software, like a spreadsheet software, than the json.
If the LNT webui had a better support for ad-hoc analyses, maybe the need for a CSV format goes away? Although there may still be value in being able to run the test-suite and do some ad-hoc analyses on the results without needing to have an LNT instance.

Both tasks sound like we could drop a ~100 line python script in test-suite/utils to get them done (I would even volunteer to write them if that is the last thing necessary and someone actually gives me a specification/good examples of what should come out of it :)

 - profile data collection
What exactly do you mean here if it is not a PGO build? (see below)

If this is the interpreting of linux perf records: yes, that's non-trivial. I think we may want to avoid moving all of that machinery to the test-suite, not sure.
Hopefully James has some ideas on how the linux perf records parsing and encoding logic might be simplified a bit, so it becomes a bit more trivial?
Maybe if instead of being a mix of Python and C code, it would all be C(++) code, it would fit more naturally under test-suite/tools?

 - test-suite json is different than LNTs
Indeed, though I don't see a strong reason why it has to be that way. Looks like a relatively small tweak to either lit or LNT (or teach one of them a 2nd format) to fix that, because the underlying data model looks compatible.

 - compile/exec multisample
It is easy to change the test-suite lit stuff to run a benchmark multiple times. The main question would be whether we invent a data format that allows multiple results for each metric (I am not entirely sure I want that as it complicates the data model) or whether we teach the lit code to do the aggregation (in which case we have to figure out how to do that exactly. The right choice of summary function probably depends on the metric so we would need some extra information on that front).

Maybe the test-suite doesn't need to do aggregation, as that's already interpretation of the data? Just send all collected data to LNT and let LNT do the analysis to come up with the preferred aggregation?
I think it'd be a slippery slope to start doing statistical analysis, however simple, in the test-suite scripts - probably best if that remains part of LNT.
I'd definitely be opposed to only send a single result, instead of multiple results to LNT, as the multiple results are often one of the most obvious ways to classify an apparent regression as noise or not.

 - the diagnose stuff
Could maybe live outside of LNT as test-suite/utils/diagnose.py?

 - running with PGO
Well a PGO build should boil down to:
cmake -DTEST_SUITE_PROFILE_GENERATE ...
ninja
lit
cmake -DTEST_SUITE_PROFILE_USE ...
ninja
lit

and I would say it is reasonable to expect the 3 extra commands compared to a "normal" build...

Seems reasonable to me - if we make sure we do write enough documentation for it to be easy for newbies to setup and run the test-suite for the different use cases.

 - result filtering (exclude stats)
>From a general point of view I would expect it to be best to just dump stuff into the LNT database and leave the filtering and aggregation to clients (= the website).
Of course if it is an unreasonable amount of data some forms of early filtering may be apropriate. In fact you can already control this somewhat today by modifying the lit plugins used (i.e. stats only appear in the output if you do cmake -DTEST_SUITE_COLLECT_STATE, codesize is a litsupport plugin as well so it would take a 3 line patch to add a cmake flag to disable that).

I think there's still value in not collecting a metric that you know is complete noise (the reason I introduced this functionality was not to collect compile-time numbers for compiling in parallel using all cores in a big.LITTLE system, where depending on which core your compile job happens to land on, you can easily get a 2x difference in compilation speed).
That being said, I assume that it would be close to trivial to have this functionality live in the test-suite.

All in all, if we'd end up deciding that the responsibility of test-suite is to run stuff & collect data, and the responsibility of LNT is to archive and analyse it, I'd say filtering of results that up-front are known to be meaningless is a responsibility of test-suite.

My goal is no client side LNT at all.  Workflow: user checks out test-suite, runs, and curls the resulting json to the server.  We use this workflow on all of our other LNT based data collection, and it works really well. Running test-suite manually, then some other tools after would be a regression in the complexity.

For metadata: could we write a tool, that the test-suite runs ad the end, that makes a new json for submission?  We could add the metadata and apply the transformations to make the json grockable by LNT, and produce the xml and csv.
As for collecting the metadata I think that would fit nicely into the cmake step (and you can write arbitrary files with cmake so it should be perfectly possible to write some .json files).

As an intermediate step we could indeed write a script in test-suite/utils that takes json output and the metadata file produced by lnt, integrates both and produces lnt output. Though in an ideal world I would see LNT being able to grok the output of lit and lit just passing the metadata along so we can leave out the extra conversion step.

- Matze

One more piece of functionality that I think may be harder to get rid of is the rerun-functionality that exists in 'lnt runtest nt'.
As part of submitting data, the LNT server can send back a request to do a few more runs of specific tests, to get more statistically significant results where needed.
I'm not sure if we'd be able to do this if we'd use e.g. curl for data submission instead of something as part of lnt.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170126/0b585f10/attachment.html>