[llvm-dev] RFC: LNT/Test-suite support for custom metrics and test parameterization

Tue Apr 26 02:37:19 PDT 2016

Hi Elena,

Thanks for pushing forward with this. I like the idea of using a NoSQL
solution.

My primary reservation is about adding the new NoSQL stuff as an extra
backend. I think the current database backend and its use of SQLAlchemy is
extremely complex and is in fact the most complex part of LNT. Adding
something more (as opposed to *replacing* it) would just make this worse
and make it more likely that contributors wouldn't be able to test LNT very
well (having three engines to test: SQLite, PostgreSQL and MongoDB).

I think it'd be far better all around, if we decide to go with the NoSQL
solution, to just bite the bullet and force users who want to run a server
to install MongoDB.

In my experience most of the teams I've seen using LNT have a single LNT
server instance and submit results to that, rather than launching small
instances to run "lnt viewcomparison".

Cheers,

James

On Tue, 26 Apr 2016 at 09:15 Elena Lepilkina via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hi everyone.
>
>
>
> Thanks to everyone who took participant in discussion of this proposal.
>
> After discussion we understood how other users use LNT and how great
> datasets may be.
>
>
>
> So there is new updated proposal.
>
> (Google docs version with some images -
> https://docs.google.com/document/d/11qHNWRIQ2gc2aWH3gNBeeCUB3-JPe7AoMtn7n9JoCeY/edit?usp=sharing
> )
>
>
>
> Goal is the same.
>
> Enable LNT support of custom metrics such as: user-defined run-time and
> static metrics (power, etc.) and LLVM pass statistic counters. Provide
> integration with LLVM testsuite to automatically collect LLVM statistic
> counters or custom metrics.
>
>
>
> Analysis of current Database
>
>
>
> Limitations
>
> 1.      This structure isn’t flexible.
>
> There is no opportunity to run another test-suite except simple one.
>
> 2.      Performance is quite bad when database has a lot of records.
>
> For example, rendering graph is too slow. On
> green-dragon-07-x86_64-O3-flto:42
> SingleSource/Benchmarks/Shootout/objinst   compile_time need for rendering
> 191.8 seconds.
>
> 3.       It’s difficult to add new features which need queries to sample
> table in database(if we use BLOB field for custom metrics).
>
> Queries will be needed for more complex analysis. For example, if we would
> like to add some additional check for tests which compile time is too long,
> we should get result of query where this metric is  greater than some
> constant.
>
> Or we would like to compare tests with different run options, so we should
> get only some tests but not all.
>
> BLOB field will help to save current structure and make system a bit more
> flexible. But in the nearest future it will be not enough.
>
> Getting all metrics of all tests will make work slow on great datasets.
> And this way isn’t enough optimal.
>
> So we wouldn’t like to do BLOB field, which wouldn’t help to add new
> features and have flexible system in future.
>
>
>
> Proposal
>
>
>
> We suggest to do third part of LNT (as Chris Matthews suggested). This
> part will be used for getting custom metrics and running any test-suite.
>
> We suggest to use NoSQL database (for example, MongoDB or JSON/JSONB
> extension of PostgresSQL, which let use PostgresSQL as NoSQL database) for
> this part. This part will be enable if there is path to NoSQL database in
> config file.
>
> It helps to have one Sample table(collection in NoSQL). If we use
> schemaless feature in MongoDB, for example, then it’s possible to add new
> fields when new testsuite is running.  Then there would be one table with a
> lot of fields, some of which are empty. At any moment of time it will be
> possible to change schema of table(document).
>
> A small prototype was made with MongoDB and ORM MongoEngine. This ORM was
> choosen because MongoAlchemy doesn’t support schemaless features and last
> MongoKit version has error with last pymongo release.
>
> I try it on virtual machine and get next results on 5 000 000 records.
>
> Current scheme - 13.72 seconds
>
> MongoDB – 1.35 seconds.
>
> Results of course will be better on real server machine .
>
>
>
> For use some test-suite user should describe fields in file with format
> .fields such way:
>
> {
>
>  "Fields" : [{
>
>    "TestSuiteName" : "Bytecode",
>
>    "Type" : "integer",
>
>    "BiggerIsBetter" : 0,
>
>     "Show" : true
>
>  },
>
>  {
>
>    "TestSuiteName" : "GCC",
>
>    "Type" : "real",
>
>    "BiggerIsBetter" : 0,
>
>    "Name" : "GCC time"
>
>  },
>
>  {
>
>    "TestSuiteName" : "JIT",
>
>    "Type" : "real",
>
>    "BiggerIsBetter" : 0,
>
>    "Name" : "JIT Compile time",
>
>    "Show" : true
>
>  },
>
>  {
>
>    "TestSuiteName" : "GCC/LLC",
>
>    "Type" : "string",
>
>    "BiggerIsBetter" : 0
>
>  }]
>
> }
>
>
>
> There was added one field “Show” for describing if this metric should be
> shown by default on web page (as James Molloy suggested). Other metrics
> would be added to page if user chooses them in view options.
>
>
>
> Conclusion
>
>
>
> This change will let user to choose if he wants to use flexible powerful
> system or use limited version with SQLite database.
>
> If user chooses NoSQL version his data can be copied from its old database
> to new one. This will help to use new features without losing old data.
>
>
>
> The actual question is which NoSQL database will be better for LNT. We are
> interested in opinions of people, who know features of LNT better.
>
>
>
> Thanks,
>
>
>
> Elena.
>
>
>
> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *Elena
> Lepilkina via llvm-dev
> *Sent:* Tuesday, April 26, 2016 9:07 AM
> *To:* chris.matthews at apple.com
>
>
> *Cc:* llvm-dev <llvm-dev at lists.llvm.org>
> *Subject:* Re: [llvm-dev] RFC: LNT/Test-suite support for custom metrics
> and test parameterization
>
>
>
> Hi, Chris.
>
>
>
> Thank you for your answer about compile tests. As I understood during
> looking through code of compile tests they don’t use test suite at all. Am
> I right? There is lack of information and examples of running compile tests
> in LNT documentation.
>
> We understood that there are two groups of users: users using servers and
> collecting a lot of data and SQLite users, but these users as I think
> wouldn’t have about millions of sample records.
>
> I think that it’s obvious that there is no universal solution for simple
> installing process and flexible high-loaded system.
>
> I will update proposal and take into consideration your suggestion about
> third part of test-suite.
>
>
>
> Thanks
>
>
>
> Elena.
>
>
>
> *From:* chris.matthews at apple.com [mailto:chris.matthews at apple.com
> <chris.matthews at apple.com>]
> *Sent:* Monday, April 25, 2016 8:06 PM
> *To:* Elena Lepilkina <Elena.Lepilkina at synopsys.com>
> *Cc:* llvm-dev <llvm-dev at lists.llvm.org>
> *Subject:* Re: [llvm-dev] RFC: LNT/Test-suite support for custom metrics
> and test parameterization
>
>
>
> I am really torn about this.
>
>
>
> When I implemented the regression tracking stuff recently, it really
> showed me how badly we are scaling.  On our production server, the run
> ingestion can take well over 100s.  Time is mostly spent in FieldChange
> generation and regression grouping. Both have to access a lot of recent
> samples. This is not the end of the world, because it runs in a background
> process.  Where this really sucks is when a regression has a lot
> indicators. The web interface renders these in a graph, and just trying to
> pull down 100 graphs worth of data kills the server.  I ended up limiting
> those to a max of 10 datasets, and even that takes 30s to load.
>
>
>
> So I do think we need some improvements to the scalability.
>
>
>
> LNT usage is spread between two groups. Users who setup big servers, with
> Postgres and apache/Gunicorn. For those uses I think a NoSQL is the way to
> go.   However, our second (and probably more common) user, is the people
> running little instance on their own machine to do some local compiler
> benchmarking.  Their setup process needs to be dead simple, and I think
> requiring a NoSQL database to be setup on their machine first is a no
> starter.  Like we do with sqlite, I think we need a transparent fall back
> for people who don’t have a NoSQL database.
>
>
>
> Would it be helpful to anyone if I got a dump of the llvm.org LNT
> Postgres database?  It is a good dataset big dataset to test with, and I
> assume everyone is okay with it being public, since the LNT server already
> is.
>
>
>
>
>
> On Apr 25, 2016, at 4:33 AM, Elena Lepilkina via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
>
>
>
>
>
> *From:* Elena Lepilkina
> *Sent:* Monday, April 25, 2016 2:33 PM
> *To:* 'James Molloy' <james at jamesmolloy.co.uk>; Kristof Beyls <
> Kristof.Beyls at arm.com>; Mehdi Amini <mehdi.amini at apple.com>
> *Cc:* nd <nd at arm.com>; Matthias Braun <matze at braunis.de>
> *Subject:* RE: [llvm-dev] RFC: LNT/Test-suite support for custom metrics
> and test parameterization
>
>
>
> Hi everyone,
>
>
>
> Thank you for your answer. BLOB format adds some more actions for working
> with metrics. We know that ComparisonResult class makes analysis work.
> But it gets all metrics by request from database, we will need additional
> time for work with fields during analysis in ComparisonResult class. May be
> it will be better to do one Sample table for each testsuite, as it was
> suggested before. It should be more quickly, shouldn’t it? Moreover, next
> wished LNT changes will need getting some metrics separately and BLOB
> format will add some delay in time for queries.
>
>
>
> As we see now problem of performance is actual, because time for rendering
> graph page is about 3 minutes.
>
> <image001.png>
>
> So maybe it will be better to start working with NoSql databases? I made a
> small prototype with TestSuite, TestSuiteFields, Test, Run and Sample
> tables for getting time metrics. It works quickly. And using NoSQL helps
> solve problems with  different fields for samples metrics fields. Then it
> will be possible to store different metrics for different testsuites in one
> table.
>
> What do you think about this proposal?
>
> I used MongoDB, but I know that there is NoSQL extension for PostgresSQL
> with JSONB fields which are more
>
> effective than JSON-encoded BLOB, because it can be included in queries
> very simply and let use indexes.
>
>
>
> About proposal that not all metrics should be shown. It can be added as a
> field in JSON in .fields file, which describes fields getted from
> test-suite. To see other metrics user should choose them with checkboxes in
> view options. Will be this solution suitable?
>
> We can make as you wrote
>
> “I'd also suggest that if we're adding many more metrics to a test, we
> should create a "test sample information" page that the test link goes to
> instead of just the graph. This page could contain all counter/metric data,
> historic sparklines, the full graph and profiling links.
>
> ”
>
> But the render time of this page will be too great because of graph render
> time. In my opinion, some users wouldn’t like to wait so long for see some
> additional metrics.
>
>
>
> Thanks for your suggestions,
>
>
>
> Elena.
>
> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org
> <llvm-dev-bounces at lists.llvm.org>] *On Behalf Of *James Molloy via
> llvm-dev
> *Sent:* Monday, April 25, 2016 12:43 PM
> *To:* Kristof Beyls <Kristof.Beyls at arm.com>; Mehdi Amini <
> mehdi.amini at apple.com>
> *Cc:* llvm-dev <llvm-dev at lists.llvm.org>; nd <nd at arm.com>; Matthias Braun
> <matze at braunis.de>
> *Subject:* Re: [llvm-dev] RFC: LNT/Test-suite support for custom metrics
> and test parameterization
>
>
>
> Hi Sergey, Elena,
>
>
>
> Firstly, thanks for this RFC. It's great to see more people actively using
> and modifying LNT and the test metrics support in general is rather weak
> currently.
>
>
>
> Metrics
>
> -------
>
>
>
> I agree with Daniel and Kristof that your proposed schema changes have the
> potential to make many queries extremely slow. Certainly for the metrics
> enhancements, I don't see a reason why we need such a radical change in
> schema.
>
>
>
> To add custom metrics on the fly, we need to change the schema for the
> Sample table. Currently this consists of a column for each metric, but
> actually we never ever query those metric values. We never query for
> example for "all failing tests in a run" - when we do analyses we use the
> ComparisonResult class which reads *all* samples from the database for a
> run and performs the analysis entirely in Python.
>
>
>
> Therefore, having a semi-structured format where some fields are
> first-class columns and the rest are in a JSON-encoded BLOB (as Daniel
> suggests) seems totally acceptable. There is certainly an argument now that
> we're using the wrong backend storage solution and that a key-value store
> might be more suitable, but that's a very invasive change and I don't think
> we've reached the point where we need to force a move from the simplicity
> of SQLite.
>
>
>
> Adding an extra BLOB column would be easy - there would just need to be
> logic in testsuitedb.py for reading and writing it - the Sample model class
> would expose the JSON-encoded fields as normal python fields so the rest of
> LNT would be isolated from this change.
>
>
>
> But I think this is a small detail compared to the bigger problem of how
> to effectively *display* all this new data. Currently every new metric gets
> its own separate table in the report/run views, and this does not scale
> well at all.
>
>
>
> I think we need some more concepts in the metric system to make it
> scaleable:
>
>
>
>   * What "attribute" of the test is this metric measuring? For example,
> both "exec_time" and "score" measure the same attribute; performance of the
> generated code. It's superfluous to have them displayed in separate tables.
> However mem_size and compile_time both measure completely different aspects
> of the test.
>
>   * Is this metric useful to display at the top level? or should it only
> be exposed when more data about a test result is requested?
>
>     * An example of this is the pass statistics. I don't want my daily
> report view cluttered by the time spent in register allocation for every
> test! OK, this is useful information when debugging a problem, but it
> should be available when requested rather than by default.
>
>
>
> An example of why we need the above is your screenshots in your google
> doc. I'm looking at the last screenshot, and it's incredibly difficult to
> read and get useful information out of.
>
>
>
> I'd also suggest that if we're adding many more metrics to a test, we
> should create a "test sample information" page that the test link goes to
> instead of just the graph. This page could contain all counter/metric data,
> historic sparklines, the full graph and profiling links.
>
>
>
> Cheers,
>
>
>
> James
>
>
>
> On Fri, 22 Apr 2016 at 10:17 Kristof Beyls via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
>
> On 22 Apr 2016, at 11:14, Mehdi Amini <mehdi.amini at apple.com> wrote:
>
>
>
>
> On Apr 22, 2016, at 12:45 AM, Kristof Beyls via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
>
>
> On 21 Apr 2016, at 17:44, Sergey Yakoushkin <sergey.yakoushkin at gmail.com>
> wrote:
>
>
>
> Hi Kristof,
>
>
>
>        The way we use LNT, we would run different configuration (e.g. -O3
> vs -Os) as different "machines" in LNT's model.
>
>
>
> O2/O3 is indeed bad example. We're also using different machines for Os/O3
> - such parameters apply to all tests and we don't propose major changes.
>
> Elena was only extending LNT interface a bit to ease LLVM-testsuite
> execution with different compiler or HW flags.
>
>
>
> Oh I see, this boils down to extending the lnt runtest interface to be
> able to specify a set of configurations, rather than a single configuration
> and making
>
> sure configurations get submitted under different machine names? We kick
> off the different configuration runs through a script invoking lnt runtest
> multiple
>
> times. I don't see a big problem with extending the lnt runtest interface
> to do this, assuming it doesn't break the underlying concepts assumed
> throughout
>
> LNT. Maybe the only downside is that this will add even more command line
> options to lnt runtest, which already has a lot (too many?) command line
>
> options.
>
>
>
> Maybe some changes are required to analyze and compare metrics between
> "machines": e.g. code size/performance between Os/O2/O3.
>
> Do you perform such comparisons?
>
>
>
> We typically do these kinds of comparisons when we test our patches
> pre-commit, i.e. comparing for example '-O3' with '-O3 'mllvm
> -enable-my-new-pass'.
>
> To stick with the LNT concepts, tests enabling new passes are stored as a
> different "machine".
>
> The only way I know to be able to do a comparison between runs on 2
> different "machine"s is to manually edit the URL for run vs run comparison
>
> and fill in the runids of the 2 runs you want to compare.
>
> For example, the following URL is a comparison of
> green-dragon-07-x86_64-O3-flto vs green-dragon-06-x86_64-O0-g on the public
> llvm.org/perf server:
>
> http://llvm.org/perf/db_default/v4/nts/70644?compare_to=70634
>
> I had to manually look up and fill in the run ids 70644 and 70634.
>
> It would be great if there was a better way to be able to do these kind of
> comparisons - i.e. not having to manually fill in run ids, but having a
> webui to easily find and pick the runs you want to compare.
>
> (As an aside: I find it intriguing that the URL above suggests that there
> are quite a few cases where "-O0 -g" produces faster code than "-O3 -flto").
>
>
>
> Can you be more explicit which ones? I don't see any regression (other
> than compared to the baseline, or on the compile time).
>
>
>
> --
>
> Mehdi
>
>
>
> D'Oh! I was misinterpreting the compile time differences as execution time
> differences. Indeed, there is no unexpected result in there.
>
> Sorry for the noise!
>
>
>
> Kristof
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160426/dee4be38/attachment.html>