[PATCH] D66572: [analyzer] NFC: BugReporter Separation Ep.I.
Artem Dergachev via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Thu Sep 5 17:19:16 PDT 2019
NoQ added inline comments.
================
Comment at: clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h:122
+ /// Get the location on which the report should be uniqued.
+ virtual PathDiagnosticLocation getUniqueingLocation() const {
+ return Location;
----------------
gribozavr wrote:
> NoQ wrote:
> > gribozavr wrote:
> > > Where can I read about the uniqueing logic? Does it mean that out of two bug reports with the same location only one gets displayed, regardless of other properties?
> > Added some comments ^^
> Thanks for the comments, but sorry, this one is still unclear.
>
> (1)
>
> ```
> // Imagine a ClangTidy checker that finds calls to math functions where it is
>
> double foo() {
> return sqrt(-1); // warning: 'sqrt' will always return NaN
> }
> ```
>
> What should be the "decl with issue" here? 'foo' or 'sqrt'?
>
> (2)
>
> ```
> #include <vector>
> using std::vector; // warning: using names from 'std' is error-prone
> ```
>
> (3)
>
> ```
> void foo(Foo f) {
> bar(std::move(f)); // warning: passing result of 'std::move' as a const reference argument, no move will actually happen
> }
> ```
>
> Is it 'foo', 'bar', 'std::move', or the copy constructor of 'Foo'?
Made another attempt at the decl with issue comment :)
================
Comment at: clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h:126-135
+ /// Get the declaration containing the uniqueing location.
+ virtual const Decl *getUniqueingDecl() const {
+ return DeclWithIssue;
+ }
+
+ /// Return the canonical declaration, be it a method or class, where
+ /// this issue semantically occurred.
----------------
gribozavr wrote:
> NoQ wrote:
> > gribozavr wrote:
> > > I don't think `getUniqueingDecl()` and `getDeclWithIssue()` make sense for most ClangTidy checkers.
> > `getDeclWithIssue()` corresponds to the "Function/Method" column we have in our scan-build index page:
> >
> > {F9881207}
> >
> > This column is not super important but probably kinda nice to have. I don't mind leaving it empty for clang-tidy checks, or possibly auto-detect it post factum when possible (e.g., find the smallest decl in which the warning location is contained).
> >
> > `getUniqueingDecl()` is, as far as i understand, only currently used by the issue hash mechanism which governs deduplication across translation units (e.g., we emit a warning against the same header in multiple translation units - it's vital for the static analyzer because it's not uncommon for an execution path that starts in the main file to end in a header). And even then, it's only present in plist output. I'm not convinced it's useful at all. It's likely that we can remove it.
> >
> > Also clang-tidy checks wouldn't need to specify their `UniqueingDecl` manually as it silently defaults to their `DeclWithIssue`.
> >
> > So basically we have these two methods on the base class for bureaucratic reasons: our existing algorithms handle all kinds of warnings uniformly based on these computed properties of bug reports. I think we should be perfectly fine if some kinds of bug reports return null from them, indicating that "this property doesn't make sense for them". We could instead reinvent protocols on top of our custom RTTI ("does this warning conform to `Uniqueable`? if so, i'll try to unique it"), but i don't think it's worth the complexity.
> > `getDeclWithIssue()` corresponds to the "Function/Method" column we have in our scan-build index page:
>
> Judging from the screenshot, that column seems to play a function of a human-readable summary of the warning location. Is that correct?
>
> Should we determine it automatically from `getLocation()`? Is there a use case for `getDeclWithIssue()` be different from the function/class that `getLocation()` is pointing into?
>
> > (e.g., find the smallest decl in which the warning location is contained).
>
> Oh you are saying the same thing. Why not have it always be auto-detected? Is there a use case for manual curation?
>
> > `getUniqueingDecl()` [...] I'm not convinced it's useful at all. It's likely that we can remove it.
>
> I agree about eliminating it -- actually, replacing it with comparing source locations. In our experience running ClangTidy, uniquing bug reports based on source locations works quite well. Can you give it a go in a separate patch that we can land ahead of time? I see there are only 4 callers of the BugReport constructor that accepts the location.
>
> Also the doc comment is somewhat contradictory. "declarations that ... contains ... the uniqueing location. This is not actively used for uniqueing..."
> Oh you are saying the same thing. Why not have it always be auto-detected? Is there a use case for manual curation?
I don't think anybody needs manual curation. We just literally always had a good decl to put in there, so i guess that's why we never bothered implementing autodetection so far.
P.S. despite http://lists.llvm.org/pipermail/cfe-dev/2019-September/063229.html this discussion is still relevant, i'll just duplicate these docs to `PathDiagnostic`.
> I agree about eliminating it -- actually, replacing it with comparing source locations. In our experience running ClangTidy, uniquing bug reports based on source locations works quite well.
I'll take a closer look at the current messy state of this thing.
This thing is part of the "issue hash" facility that was supposed to be more than just a uniqueing effort: it was supposed to be stable across changes in the source code under analysis, so that people could easily understand how their commits introduce new issues. Source locations are too granular for that because they easily skew when you add unrelated code above the source location. But the "offset into the current decl" is a much more stable metric. Cf. https://github.com/llvm-mirror/clang/commit/97bfb558f69c09b01a5c1510f08dc91eb62329a7
I don't fully understand how much this was made use of, so i'll take a closer look.
================
Comment at: clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h:174
+ /// This location is used by clients rendering diagnostics.
+ virtual PathDiagnosticLocation getLocation(const SourceManager &SM) const {
+ assert(Location.isValid());
----------------
gribozavr wrote:
> gribozavr wrote:
> > Another location-related method... I guess I would appreciate a more abstract writeup about what BugReport's data model is (in its doc comment).
> What I meant by "data model" is a high-level description like this.
>
> A bug report is one instance of a problem found by a checker. It is similar to a warning in Clang.
>
> A bug report has:
>
> - a description which is ...
>
> - a short description (optional), which is ...
>
> - zero or more ranges, which ...
>
> - zero or more fixits, which provide advice about ...
>
> - zero or more notes, which ...
>
> Each fixit has: ...
>
> Each note has...
I guess i'll carry over this comment to the upcoming `PathDiagnostic`-related patch because we're not handing off `BugReporter` anyway.
================
Comment at: clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h:108
+
+ StringRef getDescription() const { return Description; }
+
----------------
Szelethus wrote:
> gribozavr wrote:
> > What's the difference between description and short description?
> >
> > What are the expected grammatical conventions? For example, Clang and ClangTidy warning are sentence fragments that start with a lowercase letter and don't end with a period.
> I just did some work related to this! For 99% of the cases (and for one, I'm pretty sure this estimate is quite accurate) there is no difference. AFAIK CodeChecker doesnt even use the short description field emitted in the plist output.
>
> There is one checker however that uses it, RetainCount, though I dislike that particular case (the short version is something like "the counter here isn't zero", and the full version is "the counter here is +2"). Could we get rid of this?
>
> As for diagnostic messages, we start with uppercase but dont terminate the sentence.
Added comments that capture the current state of things.
================
Comment at: clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h:142
+
+ /// Add new item to the list of additional notes that need to be attached to
+ /// this path-insensitive report. If you want to add extra notes to a
----------------
gribozavr wrote:
> "Add a new note at the end of this path-insensitive report."
Ugh, this whole comment is wrong. We support additional out-of-path notes on path-sensitive reports since basically forever.
================
Comment at: clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h:147
+ /// the extra note should appear.
+ void addNote(StringRef Msg, const PathDiagnosticLocation &Pos,
+ ArrayRef<SourceRange> Ranges = {}) {
----------------
gribozavr wrote:
> I'm concerned that `PathDiagnosticLocation` is too complex of an abstraction for ClangTidy. It might even be too complex for Static Analyzer checkers. It is OK for the Static Analyzer core though.
>
> Take a look at users of `addNote` -- they use helpers like `makeLocation`. I don't think that would scale to the number of checkers that we have. The API should be simpler.
Yup, absolutely.
Even though i'm abandoning the idea of handing off `BugReporter` directly to Clang-Tidy, the problem is still going to be there, so i'll try to come up with something in later patches.
================
Comment at: clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h:210
+ PathDiagnosticLocation getLocation(const SourceManager &SM) const override {
+ return getUniqueingLocation();
+ }
----------------
gribozavr wrote:
> I think the delegation should be done in the opposite way -- `getLocation` should be the source of truth, and `getUniqueingLocation` should call `getLocation`.
Hmm, this annoying `SourceManager` argument can indeed be removed.
================
Comment at: clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h:440
visitor_iterator visitor_end() { return Callbacks.end(); }
visitor_range visitors() { return {visitor_begin(), visitor_end()}; }
----------------
gribozavr wrote:
> Could it just be `ArrayRef<...> visitors()`, without exposing extra typedefs?
Not immediately because when visitors are being actually run the code permits itself to move unique pointers out of this vector and then put it back. So it needs write access (at least in the form of non-const iterators) and it has problems with mutating the container while iterating on top of that (as visitors may add other visitors while they run). So i'd rather not touch it for now.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D66572/new/
https://reviews.llvm.org/D66572
More information about the cfe-commits
mailing list