[PATCH] D41326: [clang-tidy] Added diagnostics about incorrect usage of NOLINT comment

Wed Dec 27 13:00:59 PST 2017

aaron.ballman added inline comments.

================
Comment at: clang-tidy/ClangTidyDiagnosticConsumer.cpp:424
+// This method is not const, because it fills cache on first demand.
+// It is possible to fill cache in constructor or make cache volatile
+// and make this method const, but they seem like worse solutions.
----------------
xgsa wrote:
> aaron.ballman wrote:
> > xgsa wrote:
> > > aaron.ballman wrote:
> > > > Making the cache volatile will have no impact on this.
> > > > 
> > > > Any reason not to make the cache `mutable`, however? That's quite common for implementation details.
> > > Sorry, certainly, instead of "volatile" I meant "mutable".
> > > 
> > > Actually, using of "mutable" violates a constancy contract, as the field is get modified in a const method. Thus I'd tend to avoid using `mutable`, if possible, because e.g. in multi-threaded applications these fields require additional protection/synchronization. Moreover, I see that using of  `mutable` is not very spread in clang-tidy. Thus as, currently, `hasCheck` is called from the non-constant context, I'd prefer leaving it non-const instead of making cache `mutable`. Please, let me know if you insist on the `mutable` option.
> > Use of mutable does not violate constancy; the cache is not exposed via any interface; it is purely an implementation detail. Very little of our code base is concerned with multithreaded scenarios (we use bit-fields *everywhere*, for instance).
> > 
> > I won't insist on using `mutable` if you are set against it, but this is the exact scenario in which it is the correct solution.
> I've just tried the `mutalbe`-approach and discovered one more issue: `ClangTidyASTConsumerFactory` requires non constant `ClangTidyContext`. Thus, if current approach is not suitable, either `ClangTidyASTConsumerFactory` should be refactored on const and nonconst parts or caching should be done in constructor (which makes  fetching the names not lazy). Because of this, currently I am suggesting to leave `hasCheck` nonconstant.
Ah, okay, that makes sense to me -- thank you for looking into it, I think how you have it is reasonable enough then.

================
Comment at: clang-tidy/ClangTidyDiagnosticConsumer.cpp:839-840
+        case NolintCommentType::Nolint:
+          Message = "there is no diagnostics on this line, "
+                    "the NOLINT comment is redundant";
+          break;
----------------
xgsa wrote:
> aaron.ballman wrote:
> > xgsa wrote:
> > > aaron.ballman wrote:
> > > > I don't think the user is going to care about the distinction between no diagnostics being triggered and the expected diagnostic not being triggered. Also, it's dangerous to claim the lint comment is redundant because it's possible the user has NOLINT(foo, bar) and while foo is not triggered, bar still is. The NOLINT comment itself isn't redundant, it's that the check specified doesn't occur.
> > > > 
> > > > I would consolidate those scenarios into a single diagnostic: "expected diagnostic '%0' not generated" and "expected diagnostic '%0' not generated for the following line".
> > > > 
> > > > One concern I have with this functionality is: how should users silence a lint diagnostic that's target sensitive? e.g., a diagnostic that triggers based on the underlying type of size_t or the signedness of plain char. In that case, the diagnostic may trigger for some targets but not others, but on the targets where the diagnostic is not triggered, they now get a diagnostic they cannot silence. There should be a way to silence the "bad NOLINT" diagnostics.
> > > > I don't think the user is going to care about the distinction between no diagnostics being triggered and the expected diagnostic not being triggered. Also, it's dangerous to claim the lint comment is redundant because it's possible the user has NOLINT(foo, bar) and while foo is not triggered, bar still is. The NOLINT comment itself isn't redundant, it's that the check specified doesn't occur.
> > > > 
> > > > I would consolidate those scenarios into a single diagnostic: "expected diagnostic '%0' not generated" and "expected diagnostic '%0' not generated for the following line".
> > > 
> > > This branch of `if (NolintEntry.first.CheckName == NolintCommentsCollector::AnyCheck)` reports only about `NOLINT`/`NOLINTNEXTLINE` comments without check list, so I suppose it's fair to claim that this comment is redundant (we have already checked that no single check reported diagnostics on the line). The `else`-branch reports the diagnostics for the definite check in a list in case of `NOLINT(foo, bar)` (actually, if neither `foo` nor `bar` checks reported diagnostics for the line, there will be a few diagnostics from `nolint-usage` - not sure if it's good, but it seems acceptable). That is why, I suppose, it is necessary to distinct these cases.
> > > 
> > > > One concern I have with this functionality is: how should users silence a lint diagnostic that's target sensitive? e.g., a diagnostic that triggers based on the underlying type of size_t or the signedness of plain char. In that case, the diagnostic may trigger for some targets but not others, but on the targets where the diagnostic is not triggered, they now get a diagnostic they cannot silence. There should be a way to silence the "bad NOLINT" diagnostics.
> > > 
> > > There is such mechanism: it is possible to specify `// NOLINT(nolint-usage)` or `//NOLINT(check1, check2, nolint-usage) to silence the `nolint-usage`-mechanism. Please, see tests for details and more examples. 
> > Can you provide an example where this distinction will make a difference to the user and help clarify a confusing situation? I cannot think of one, and it would be nice to simplify this code.
> > 
> > Thank you for the explanation about nolint-usage. This is not terminology I've seen before -- is this your invention, or is it a documented feature for NOLINT comments?
> >> Can you provide an example where this distinction will make a difference to the user and help clarify a confusing situation? I cannot think of one, and it would be nice to simplify this code.
> 
> Example for the diagnostics emitted in the `if`-branch:
> ```
> class A2 { explicit A2(int i); }; // NOLINT
> => warning: there is no diagnostics on this line, the NOLINT comment is redundant
> ```
> Thus, the whole NOLINT comment should be removed.
> 
> Example for the diagnostics emitted in the `else`-branch:
> ```
> // Case: NO_LINT for the specific check on line with an error on another check.
> class A4 { A4(int i); }; // NOLINT(misc-unused-parameters, google-explicit-constructor)
> => warning: there is no diagnostics for the 'misc-unused-parameters' check on this line, the NOLINT comment is redundant
> ```
> In this case, only misc-unused-parameters check should be removed from the list, but not the whole NOLINT.
> 
> Note also that if in the last example there is only misc-unused-parameters in the NOLINT checks list, the diagnostics will be the same. However, I suppose it's acceptable, because even if a user removed only a check name from the check list without removing NOLINT itself, clang-tidy will suggest removing NOLINT itself on the next run.
> 
> 
> >> Thank you for the explanation about nolint-usage. This is not terminology I've seen before -- is this your invention, or is it a documented feature for NOLINT comments?
> 
> It's my invention, I suppose, from the user point of view it's logical to consider this mechanism as a check.
Thank you for the examples, but I don't think they would add any clarification for the user, either. That said, I think we could collapse all four diagnostics into: "there are no diagnostics %select{|for '%2'}0 on %select{this|the next}1 line" where %0 distinguishes between NOLINT with and without an explicit check name, %1 distinguishes between NOLINT and NOLINTNEXTLINE, and %2 is the name of the check in question (if any).

>> Thank you for the explanation about nolint-usage. This is not terminology I've seen before -- is this your invention, or is it a documented feature for NOLINT comments?
> It's my invention, I suppose, from the user point of view it's logical to consider this mechanism as a check.

I think we should devise a better way of expressing this unless there's industry practice suggesting a specific term or syntax. nolint-usage isn't very discoverable. Whatever we wind up with, it should work naturally for constructs like your first example. In the first example, diagnosing that NOLINT is redundant only makes sense in some circumstances, such as when the construct is in a source file that is not compiled with google-explicit-constructor enabled. However, if the code is in a shared header file where some (but not all) source code is checked with google-explicit-constructor enabled, it serves a purpose and removing it would be incorrect. The same holds for NOLINT(check).

Ultimately, I think that diagnosing NOLINT on a line that issues no diagnostics should not be enabled by default. It's highly likely that the NOLINT directive is due to reasons unknowable to the implementation, and suggesting the user remove the comment seems unhelpful. My example above is one plausible case where NOLINT may serve a purpose. Another example is code checked by clang-tidy as well as other tools; the NOLINT may be silencing the other tools. I think it should take extra work by the user to enable this functionality in a catch-all manner -- either via different syntax or enabled only through an explicit flag.

I think that diagnosing NOLINT(check-name) when that diagnostic is not triggered is useful by default because the user is explicitly expecting a diagnostic by name. However, I think the user needs an almost-trivial way to silence the "not triggered" diagnostic on a case-by-case basis. Target specific diagnostics as well as shared code are both examples of situations where the NOLINT comment may be accurate but for situations unknown to clang-tidy. Telling the user to remove the NOLINT comment would be the wrong thing to do.

================
Comment at: clang-tidy/ClangTidyDiagnosticConsumer.h:295
+  // separately to process NOLINT misuse.
+  std::unordered_map<unsigned, std::vector<unsigned>> DiagIDsToLineNumbers;
 };
----------------
xgsa wrote:
> aaron.ballman wrote:
> > xgsa wrote:
> > > aaron.ballman wrote:
> > > > Perhaps this would make more sense as an `llvm::IndexedMap` or other non-STL datatype?
> > > I think, the map of vectors fits well: it provides efficient adding, storing and iterating through elements. In spite items in vector can be duplicated, it's not a usual case (when there are a lot of diagnostics for the same check on the same line), so I don't think it worth using set. I don't see how `llvm::IndexedMap` could help here. What are the concerns about the current data structure?
> > STL data structures are not particularly well-optimized; they're meant for general usage cases. This map can get very large in real projects, so I worry about performance concerns. That's why we have specialized ADTs -- you should generally prefer them to STL containers.
> I have reviewed llvm guide [1] and haven't found more suitable ADT, except `std::map`, which provides O(log(N)) complexity for lookup and insert against average O(1) for `std::unordered_map`. `llvm::IndexedMap` doesn't look suitable, as there could be a lot of lines in the file, whereas only a few will contain NOLINT comment, so `llvm::IndexedMap` won't store data efficiently.
> 
> Moreover, as far as I understand, this collection stores data for the translation unit and is cleaned up after it is processed, so there shouldn't be millions of diagnostics line numbers stored here.
> 
> Thus, I'd prefer leaving the container type as is to optimize for performance, but I would be also OK with `std::map` to optimize for memory usage.
> 
> [1] - http://llvm.org/docs/ProgrammersManual.html#picking-the-right-data-structure-for-a-task
Being single-TU doesn't really matter much for number of lines processed -- generated code is quite common. That said, we can leave it until performance concerns become a problem in practice.

Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D41326