[clang] [UVT] add update-verify-tests.py (PR #97369)

Tue Jul 2 12:08:47 PDT 2024

hnrklssn wrote:

> This is a nice addition, but I think it should follow the conventions established by the existing update_*_test_checks.py scripts as much as possible, at least:
> 
>     * Ability to parse RUN lines, re-execute them autonomously, and modify test files in place based on their output
> 
>     * Proper argument parsing via the argparse library
> 
>     * Name of the script itself
> 
> 
> You should integrate with the UpdateTestChecks/common.py infrastructure as much as possible. At a minimum, the collection of test cases to update, the RUN line parsing, and stuff like `common.invoke_tool` should almost certainly be shared. This may require some minor updates to that common infrastructure, but that's a good thing.

I did consider this at first - here's my reasoning why I didn't, please let me know what you think:

I think that the added complexity of RUN line parsing + `common.invoke_tool` would cost more than it's worth for this tool. It's already a reimplementation of something that `lit` does excellently. `lit` also has automatic test discovery, instead of having to manually provide each test case to the script. Any test case relying on RUN lines is going to be tested with `lit` anyways, so it's not like we're adding any extra coupling.

The `update_*_test_checks.py` scripts use `common.invoke_tool` to prevent the output from being piped into `FileCheck`, but this script uses the raw output of the `-verify` flag (which is rarely if ever piped into anything), so invoking the lines manually doesn't provide the same extra value.

If anything I think `lit` should have a plugin system that allows a plugin to intercept the test invocations and control how RUN lines are executed. I didn't implement that now though, because I didn't want to expand the scope of this patch too much.

> The actual generating and updating of CHECK lines is probably sufficiently different that sharing might not make sense. That said, I wonder if this script can be generalized to automatically generate CHECK lines for other tools that produce diagnostic output.

This script relies on the format of the output of the `-verify` flag to get the ground truth of which diagnostics are mismatching, and how. FileCheck doesn't provide as comprehensive information because it's much more flexible, so it'd be harder for it to confidently say "this check doesn't match anything and this check should be inserted on this line". E.g. this script doesn't even try to handle checks with regexes, while any FileCheck check can contain a regex unless they specify `LITERAL`.

For the checks themselves I think they are sufficiently different from FileCheck checks that it'd be hard to generalise the output: multiple `expected-*` checks can point to the same line, and at the same time we have no information about the order the diagnostics are actually emitted to the use, e.g. a later diagnostic can point to an earlier line.

https://github.com/llvm/llvm-project/pull/97369