[cfe-dev] State of the serialization tests, and plan to fix it

Mon Apr 8 06:53:08 PDT 2019

On Tue, Apr 2, 2019 at 3:39 PM Bruno Ricci <riccibrun at gmail.com> wrote:
Hm, no one wants to reply :)
I guess i could try to..

First of all, thank you for looking into it!

> Hi all,
>
> I recently discovered that serialization of the AST is only very lightly
> tested. What is tested is mostly that parsing works with an external
> source, but that in itself does not guarantee that the round trip
> serialization + deserialization is lossless. In addition some nodes are
> not tested at all. This can be seen in the coverage report available here:
> http://lab.llvm.org:8080/coverage/coverage-reports/clang/index.html.
Yes, that seems very unfortunate. Especially since that serialization
logic (i guess?) will be more used in the future due to the Modules.

This also limits cleanup work on serialization - e.g. may be able
to significantly(or not) reduce the size of these dumps if we added
Abbrevs to more AST nodes.
But it's a bad idea to do that without test coverage.

> I would like to fix this and am looking for comments on the following plan:
Yay!

> Plan 1:
> -------
> Go through each AST node and ensure that its state is dumped fully.
Yes, please.

> To avoid
> polluting to output of ast-dump, split the various bits of state into two
> parts, the first part being the part generally useful for people working
> on clang (ie: mostly what is currently dumped), and the second part being
> the rest (which is rather subjective obviously).
>
> (though this can be skipped if the additional data in the output in
>  the AST dump is judged to be acceptable.)
I guess to answer that one would need to see just how much more output
that adds. Unless that separation already exists, i'd try not to introduce it.

> Then use that to check that the serialization works as intended. There is
> the possibility of re-using the ast-dump tests (in AST/ast-dump-*) here.
Sounds reasonable. They do basically the same thing, especially since
in both cases you want to check the final AST dump.

> Of course this does not prevent someone from adding some data to some
> node and forgetting to update both the serialization and the AST dump.
True, but i think this is mostly the problem of code review.

> Possible alternative plan:
> --------------------------
> Superficially it seems like ASTStructuralEquivalence could be used. However
> as far as I can tell, as its name implies, it only look at the structure
> at the AST and does not look to the various bits of each node. Additionally,
> it does not currently handle statements and expression at all.
>
>
> I am missing something here, and does anyone have a better idea ?

I know i have said that already, but this is a very fundamental problem.
And i think it is *significantly* caused by the lack of tooling.
It is no fun to manually write CHECK lines, *especially* good ones,
**especially** with good coverage.

Fixing the coverage of ASTDumper tests will result in a significant amount
of tests, that should hopefully be really brittle, and break if not updated
to show the changes in the ASTDumper. But that will unquestionably
cause some disdain(FIXME: can't pick better word), since they would
need to be updated. And as i have already said the coverage may be
bad because it is tedious to write these tests.. :)

So i would really recommend to approach this from tooling perspective first,
before adding new tests. See:
llvm/utils/update_llc_test_checks.py
llvm/utils/update_mca_test_checks.py
llvm/utils/update_mir_test_checks.py
llvm/utils/update_test_checks.py

I would really recommend to examine them, and come up with a similar
script for -ast-dumper,
so that updating tests will consist *only* of two steps:
1. write the source you'd want to test (+run-line)
2. run the updater tool, which will do the rest
    (drop old check lines, run the runline, sanitize the output, write
new checklines)

Once again, thank you for looking into it!

> Bruno
Roman.