[PATCH] D34584: Introduce new JSON import format

Wed Jun 28 16:04:48 PDT 2017

MatzeB added a comment.

Before I dive in the meta discussion, are there any issues with the patch itself?

In https://reviews.llvm.org/D34584#793766, @kristof.beyls wrote:

> In https://reviews.llvm.org/D34584#792859, @MatzeB wrote:
>
> > >> Drop Run/Info/tag: json itself is a schemaless format so it feels more natural without one. The LNT testsuite type is now determined by the submit URL used, or the '-s' commandline flag on the commandline. Both default to 'nts' if omitted.
> > > 
> > > I don't fully understand how LNT reacts to this, but on the surface makes sense to me. If someone ever had time to do a major redesign of the database schema, there is a chance we might also be able to drop the need for differences in the database schema per test suite, like "nts"? Anyway, that is definitely far out of scope for this patch.
> >
> > The patch works fine. People using non-nts testsuites (which I have a strong suspicion only exist within apple), may need to supply a `-s MySchema` flag for the commandline tools or use `db_default/v4/MySchema/submitRun` instead of `db_default/submitRun` to express they don't use the 'nts' default.
> >
> > This is probably not the right patch to discuss LNT database design. But some points anyway:
> >
> > - IMO making SQL tables too generic (like having one row with MetricType, Test, Run and Value; so we could create new MetricTypes on demand) is a sure way to kill performance. As long as we use SQL we should have 1 table column for each metric.
> > - A schema is nice when you need to decide bits like the unit of a metric, whether bigger is better or getting a human readable metric name for the website. However we could make good arguments that we only need that on the presentation/analysis side and not as part of the data storage model.
>
>
> Agreed that not losing performance will be the key challenge for any redesign of the SQL schema to avoid having test-suite-specific tables. Anyway, it's indeed a discussion for another time...

Again unrelated to this patch,

Note that LNT is already designed to be extensible. Schemas like 'nts' or 'compile' switch the metrics available or what fields are considered for identifying/ordering runs. It just fails being user friendly so that people would actually create new schemes or modifying existing when they setup new benchmarks. IMO the current database design is good enough and we rather need a patch that makes creating and installing new schemes easy. I imagine a world where testsuite come with lnt.schema.json which describes the names/bigger is better/etc. for a testsuite and then when you setup a server you just reference those schemas in your lnt.cfg and get a server that matches your testsuite.

> 
> 
>>>> Previously:
>>>> 
>>>> "Machine": {
>>>> 
>>>>   "name": "mymachine",
>>>>   "Info": {
>>>>     "hardware": "HAL 9000"
>>>>   }
>>>> 
>>>> },
>>>>  "Run": {
>>>> 
>>>>   "Start Time": "2017-05-16 21:31:42",
>>>>   "End Time": "2017-05-16 21:31:42",
>>>>   "Info": {
>>>>     "__report_version__": "1",
>>> 
>>> I don't know what __report_version__ is used for today, but would keep on having some kind of "report_version" key still help if in the future we'd want to change this format again? Having a version string in there could make it easier to support multiple versions?
>> 
>> I talked a bit with Chris about this. And I think we want to persue a somwhat weaker version scheme were tools add something like "generated": "LNT version 0.4.1" or "generated": "llvm test-suite runner 0.1", similar in spirit to the http user agent header.
> 
> I'm not sure I fully get this. I know of at least 2 groups having custom scripts to translate data coming out of their custom benchmarking system into something that can be imported into LNT. I know of yet another group planning to do something similar. I think it won't be possible for LNT to understand all possible producers of json files by name. So, with my limited insight here, I think that just a numerical version number or something similar indicating which version of the specification this json follows would be better than a string naming the producer?
>  In other words, I think this json format is rapidly becoming one of the key public interfaces of LNT - maybe the single most important one. If LNT keeps on growing in popularity in teams outside of LLVM, I think we'll end up having to specify this interface, and version control of it, a bit more strictly.

Ok, I'll add a field '__format_version__' to the toplevel.

> 
> 
>>>>     "tag": "nts",
>>>>     "run_order": "123456"
>>>>   }
>>>> 
>>>> }
>>>> Now:
>>>> 
>>>> "machine": {
>>>> 
>>>>   "name": "mymachine",
>>>>   "hardware": "HAL 9000"
>>>> 
>>>> },
>>>>  "run": {
>>>> 
>>>>   "start_time": "2017-05-16 21:31:42",
>>>>   "end_time": "2017-05-16 21:31:42",
>>>>   "llvm_project_revision": "123456"
>>> 
>>> Since I see LNT being used also for tracking performance for non-LLVM projects, I think it'd be best to drop "llvm_" from "llvm_project_revision".
>> 
>> - The name for the field is determined by the schema in LNT, so not necessarily part of this patch.
> 
> OK, I agree it doesn't have to be part of this patch. But taking into account my remark above about the json file rapidly becoming one of the most important interfaces of LNT, I think that it may not be tenable in the long run to have the json fields map exactly to names of fields in the database schema. It is of course a nice to have; but the field names in the json file are on the publicly-visible interface of LNT, and the database field names are an implementation detail from a user point-of-view. I can foresee reasons why both might start deviating.
>  Since my gut feel is that it will be harder/more work to change the "json format specification" in the future, I thought that maybe it could be part of the change that is going to happen as part of this patch. Anyway, I don't have too strong a feeling about the exact name of this field.

Yes and no; As see above about my vision of actually using the LNT feature of different benchmark suites instead of pushing everything into NTS :)

> 
> 
>> - "project_revision" seems to generic to me (I wouldn't know whether it's the compiler or the benchmark suites revision). I could probably live with "compiler_revision", but we should have a separate patch for that.
> 
> Agreed, it's not the best possible name. It should be the revision of the software for which you're tracking the performance in my mind. That software doesn't have to be a "compiler". Not sure what a better name would be though.

I'd be fine changing it to "compiler_revision" right away, if we need more discussion about how to call the field, then I'd suggest doing that in another patch to not delay this patch.

> 
> 
>>> If so, it'll be important to make sure the lnt.llvm.org is operating well and all public bots currently submitting data to llvm.org/perf have switched to submitting lnt.llvm.org before this lands as otherwise the bots will start submitting this new format and the very out-of-date LNT server at llvm.org/perf will not be able to cope?
>> 
>> - We are in luck here: The reporting part haven't changed with this patch so `lnt runtest` still produces the old format. (I planed to change that separately)
>> - I hope we have ways to upgrade the llvm.org LNT instance...
>> 
>>> Although that being said, it seems the llvm.org/perf service is down since last weekend's maintenance works on llvm.org, and maybe we just have to switch submitting to lnt.llvm.org anyway.
> 
> I think we ended up doing this within the past 12 hours...

Repository:
  rL LLVM

https://reviews.llvm.org/D34584