[PATCH] D54141: [clang-tidy] add deduplication support for run-clang-tidy.py

Stephen Kelly via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Nov 6 09:43:46 PST 2018


steveire added a comment.

In https://reviews.llvm.org/D54141#1288851, @JonasToth wrote:

> > So, running `clang-apply-replacements --issue-diags the_new_file.yaml` would issue the warnings/fixit hints by processing the yaml and issuing the diagnostics the way clang-tidy would have done (See in my proposed design that we silence clang-tidy).
> > 
> > Note also that the `--issue-diags` option does not yet exist. I'm proposing adding it.
>
> At the moment `clang-apply-replacements` is called at the end of an clang-tidy run in `run-clang-tidy.py` That means we produce ~GBs of Yaml first, to then emit ~10MBs worth of it.
>  I think just relying on `clang-apply-replacements` is not ok.


I think my proposal is still unclear to you. Sorry about that.

I am proposing on-the-fly de-duplication, but without regex parsing of the diagnostic output of clang-tidy.

My proposal is still the same as I wrote before, but maybe what I write below will be more clear. Sorry if this seems condescendingly detailed. I don't know where the misunderstanding is coming from, so I err on the side of detail:

Imagine you have two cpp files which both include the same header. Imagine also that all 3 files are missing 1 override keyword and you run the modernize-use-override check on the two translation units using run-clang-tidy.py.

Here is what I propose happens:

- First, assume that the two translation units are processed serially, just to simplify the process as described. You will see that parallelizing does not change anything.
- clang-tidy gets run on file1.cpp in a way that it does not write diagnostics (and fixes) to stdout/stderr, but only generates a yaml file representing the diagnostics (warnings and fixes).
- Two diagnostics are created - one for the missing override in file1.cpp and one for the missing override in shared_header.h
- the on-the-fly deduplication cache is empty, so both diagnostics get added to the on-the-fly deduplication cache
- Because both were added to the cache, both diagnostics get written to a temporary file foo.yaml
- `clang-appy-replacements --issue-diags foo.yaml` is run (some other tool could be used for this step, but CAR seems right to me)
- `clang-appy-replacements --issue-diags foo.yaml` causes the two diagnostics to be issued, exactly as they would have been issued by clang-tidy.
- `clang-appy-replacements --issue-diags foo.yaml` DOES NOT actually change the source code. It only emits diagnostics
- Next process file2.cpp
- Processing file2.cpp results in two diagnostics - one for the missing override in file2.cpp and one for the missing override in shared_header..h
- NOTE: The diagnostic for the missing override in shared_header.h is a duplicate
- NOTE: The diagnostic for the missing override in file2.cpp is NOT a duplicate
- Try to add both to the on-the-fly deduplication cache
- Discover that the diagnostic for file2.cpp IS NOT a duplicate. Add it to a tempoary bar.yaml (named not to conflict with any other temporary file! This is built into temporary file APIs).
- Discover that the diagnostic for shared_header.h IS a duplicate. DO NOT write it to the temporary file
- Run `clang-appy-replacements --issue-diags bar.yaml`
- Run `clang-appy-replacements --issue-diags bar.yaml` causes ONLY the diagnostic for file2.cpp to be issued because that is all that is in the file.
- At the end, you have a de-duplicated yaml structure in memory. Write it to the file provided to the --export-fixes parameter of `run-clang-tidy.py`.

Do you understand the proposal now?

This means that you get

- A deduplicated fixes file
- De-duplicated diagnostics issued - which means you can process them in your CI system etc.

> That fits the distributed use-case as well?

If the distributed system processes more than one file at a time on the remote computer.

I'm not aware of any distributed systems that work that way. I think they all process single files at a time.

However, when the resulting yaml files are sent back to your computer, your computer can deduplicate the diagnostics issued (in my design).


Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D54141





More information about the cfe-commits mailing list