[PATCH] D54141: [clang-tidy] add deduplication support for run-clang-tidy.py

Fri Jan 18 10:01:41 PST 2019

aaron.ballman added a comment.

In D54141#1289980 <https://reviews.llvm.org/D54141#1289980>, @hokein wrote:

> >> If you're suggesting proceeding with this regex based solution, I
> > 
> > don't think that's a good idea. Why commit a hack which people will object to ever removing? Just see if we can do the right thing instead.
>
> +1, my main concern is the complexity of the patch and maintenance burden of the python script.

I think these are reasonable concerns and to a degree I share them. At the same time, I worry we may be leaving useful functionality behind in favor of functionality that doesn't exist and doesn't appear to be moving forward. If we were to move forward with this patch, nothing prevents us from surfacing it more naturally later when we have the infrastructure in place for the better solution, correct?

>> At the moment clang-apply-replacements is called at the end of an clang-tidy run in run-clang-tidy.py That means we produce ~GBs of Yaml first, to then emit ~10MBs worth of it.
> 
> That's why I suggest using some sort of other space-efficient formats to store the fixes. My intuition is that the final deduplicated result shouldn't be too large (even for YAML), because 1) no duplication 2) these are **actual diagnostics** in code, a healthy codebase shouldn't contain lots of problem 3) you have mentioned that you use it for small projects :)

Re: #2, I don't think that assertion is true in practice. I expect there are plenty of projects that contain a lot of clang-tidy diagnostics, especially given that clang-tidy checks tend to have higher false positive rates. Even if clang-tidy checks were not so chatty, "shouldn't" and "don't" are very different measurements.

I'm not suggesting to plow full-steam-ahead with this patch or that the concerns raised here are invalid, but at the same time, I think it does solve a real problem and it would be a shame to lose a workable solution because something better might be possible. If work is taking place to actually implement that something better, then that's a different matter of course. I get the impression though that "something better" is an extensive amount of work compared to what's in front of us; am I misunderstanding?

In D54141#1291509 <https://reviews.llvm.org/D54141#1291509>, @JonasToth wrote:

> My opinion is, that we should put as much of the deduplication into clang-tidy itself and not rely on tools like run-clang-tidy.py if we can.

Strong +1. TBH, I was unaware people used run-clang-tidy.py. ;-)

Repository:
  rCTE Clang Tools Extra

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D54141/new/

https://reviews.llvm.org/D54141