[clang] Reland "[Utils] add update-verify-tests.py" (#108630)" (PR #108658)

Tue Sep 17 17:34:54 PDT 2024

hnrklssn wrote:

> > But for clang -verify you have to add checkers for exactly the emitted diagnostics. No more or less. So the only real question is on which line to place them.
> 
> I don't think it's that simple at least for some tests. We have tests that run in `-verify` mode multiple times with different combinations of compiler options. Such tests might rely on custom prefixes (`-verify=prefix1,prefix2`), and updating them automatically can easily yield incorrect result. As an example how more complicated tests might look like, consider C++ DR test suite. Here's one example:
> 
> https://github.com/llvm/llvm-project/blob/ca0613e0fcef2a9972f2802318201f8272e74693/clang/test/CXX/drs/cwg3xx.cpp#L779-L782

This script doesn't support test failures in multiple prefixes or regex matchers for that reason, because I could not find a reliable way to update those failures without manual intervention.

> > That said my main goal is not to make creating tests easier (although it of course also does that, and I don't know that I agree that it's harmful to lower the threshold to create tests), but to make it easier to batch update many/large breaking tests. We have many tests downstream that while testing features that don't yet exist upstream, are affected by new diagnostics added upstream. Handling many tests breaking all at once because of upstream changes is easier if it doesn't require a ton of sed-fu. Similar interactions happen when cherry-picking a patch from a dev branch to one or multiple release branches, that may not contain the exact same set of diagnostics. I could of course keep this script downstream, but we are not the only ones with downstream changes and I think others will find it helpful. And that's for a completely new script: keeping something like #108425 downstream only would be result in merge conflicts.
> 
> I don't intend to dismiss the pain downstream could have over this, but in my opinion it doesn't justify opening floodgates of automated test updates upstream. At the moment I don't have a strong opinion on infrastructure that supports such automated updates, like #108425 you mentioned.
> 
> > Furthermore, I have observed how existing regression tests create friction to improving phrasing of existing diagnostics. Old diagnostics often appear in many many tests, so while the code change to improving the diagnostic may be quick, you know it'll take significant time to update all of the regression tests using the old phrasing.
> 
> I agree that the pain is there. But again, I don't think the end justifies the mean.

That's fair, we disagree on this point. I think responsible use of  automatic test updaters is reasonable, and we shouldn't prevent people from using it just because it _could_ be abused. I'm sympathetic that there may be many low quality PRs to wade through, but I'd rather let the number of open PRs continue to grow and let more people step up to the task of reviewing as it becomes more apparent that we need more reviewers. Artificially hampering developer productivity just because it's not the bottleneck is hiding the symptom rather than fixing the problem (too few reviewers in comparison to the number of PRs).

I do want to stress though, that this script handles pretty **trivial** updates. Yes, it doesn't force people to look at the changes they're submitting, but compared to existing scripts this is significantly less risky. I don't want to resort to whataboutism, but I do think we need to be consistent if we're going to ban test update scripts.

> > So you don't do that drive-by improvement, because that wasn't your goal anyways.
> 
> Drive-by fix that makes you update a lot of test would introduce a lot of noise into your PR anyway, making it noticeably harder to review.

Of course the drive-by fix would have to be submitted as a separate PR in that case, but that's quite straightforward even after the fact when you have a script like this: extract the implementation into a separate commit, and rerun the test updater on both branches to bring over the test updates to the new PR and undo the noise on the original PR.

> I believe the script addresses the wrong problem here.

There's no getting around that the tests need to be updated if diagnostics are updated. Lack of reviewers doesn't mean developer issues can't also be addressed.

https://github.com/llvm/llvm-project/pull/108658