[PATCH] D41326: [clang-tidy] Added diagnostics about incorrect usage of NOLINT comment

Sun Dec 24 08:50:58 PST 2017

aaron.ballman added inline comments.

================
Comment at: clang-tidy/ClangTidyDiagnosticConsumer.cpp:161
+
+  const SmallVector<StringRef, 1>& getCheckNames() const {
+    assert(isNolintFound());
----------------
aaron.ballman wrote:
> The `&` should bind to the right. However, instead of returning the concrete container, why not expose a range interface instead?
This could use a typedef to make the return type more readable. Also, the function should be `checkName()`, dropping the "get".

================
Comment at: clang-tidy/ClangTidyDiagnosticConsumer.cpp:276
+
+  using NolintMap = std::unordered_map<NolintTargetInfo,
+                                       NolintCommentInfo,
----------------
xgsa wrote:
> aaron.ballman wrote:
> > Is there a better LLVM ADT to use here?
> This data structure provides the fast lookup by check name+line number and it's exactly what is necessary. What are the concerns about this data structure?
Same as above.

================
Comment at: clang-tidy/ClangTidyDiagnosticConsumer.cpp:381

+void ClangTidyContext::setPreprocessor(Preprocessor *preprocessor) {
+  if (preprocessor && isCheckEnabled(NolintCheckName)) {
----------------
aaron.ballman wrote:
> Name does not match coding standard.
Please do not name the parameter the same as a type name.

================
Comment at: clang-tidy/ClangTidyDiagnosticConsumer.cpp:424
+// This method is not const, because it fills cache on first demand.
+// It is possible to fill cache in constructor or make cache volatile
+// and make this method const, but they seem like worse solutions.
----------------
xgsa wrote:
> aaron.ballman wrote:
> > Making the cache volatile will have no impact on this.
> > 
> > Any reason not to make the cache `mutable`, however? That's quite common for implementation details.
> Sorry, certainly, instead of "volatile" I meant "mutable".
> 
> Actually, using of "mutable" violates a constancy contract, as the field is get modified in a const method. Thus I'd tend to avoid using `mutable`, if possible, because e.g. in multi-threaded applications these fields require additional protection/synchronization. Moreover, I see that using of  `mutable` is not very spread in clang-tidy. Thus as, currently, `hasCheck` is called from the non-constant context, I'd prefer leaving it non-const instead of making cache `mutable`. Please, let me know if you insist on the `mutable` option.
Use of mutable does not violate constancy; the cache is not exposed via any interface; it is purely an implementation detail. Very little of our code base is concerned with multithreaded scenarios (we use bit-fields *everywhere*, for instance).

I won't insist on using `mutable` if you are set against it, but this is the exact scenario in which it is the correct solution.

================
Comment at: clang-tidy/ClangTidyDiagnosticConsumer.cpp:839-840
+        case NolintCommentType::Nolint:
+          Message = "there is no diagnostics on this line, "
+                    "the NOLINT comment is redundant";
+          break;
----------------
xgsa wrote:
> aaron.ballman wrote:
> > I don't think the user is going to care about the distinction between no diagnostics being triggered and the expected diagnostic not being triggered. Also, it's dangerous to claim the lint comment is redundant because it's possible the user has NOLINT(foo, bar) and while foo is not triggered, bar still is. The NOLINT comment itself isn't redundant, it's that the check specified doesn't occur.
> > 
> > I would consolidate those scenarios into a single diagnostic: "expected diagnostic '%0' not generated" and "expected diagnostic '%0' not generated for the following line".
> > 
> > One concern I have with this functionality is: how should users silence a lint diagnostic that's target sensitive? e.g., a diagnostic that triggers based on the underlying type of size_t or the signedness of plain char. In that case, the diagnostic may trigger for some targets but not others, but on the targets where the diagnostic is not triggered, they now get a diagnostic they cannot silence. There should be a way to silence the "bad NOLINT" diagnostics.
> > I don't think the user is going to care about the distinction between no diagnostics being triggered and the expected diagnostic not being triggered. Also, it's dangerous to claim the lint comment is redundant because it's possible the user has NOLINT(foo, bar) and while foo is not triggered, bar still is. The NOLINT comment itself isn't redundant, it's that the check specified doesn't occur.
> > 
> > I would consolidate those scenarios into a single diagnostic: "expected diagnostic '%0' not generated" and "expected diagnostic '%0' not generated for the following line".
> 
> This branch of `if (NolintEntry.first.CheckName == NolintCommentsCollector::AnyCheck)` reports only about `NOLINT`/`NOLINTNEXTLINE` comments without check list, so I suppose it's fair to claim that this comment is redundant (we have already checked that no single check reported diagnostics on the line). The `else`-branch reports the diagnostics for the definite check in a list in case of `NOLINT(foo, bar)` (actually, if neither `foo` nor `bar` checks reported diagnostics for the line, there will be a few diagnostics from `nolint-usage` - not sure if it's good, but it seems acceptable). That is why, I suppose, it is necessary to distinct these cases.
> 
> > One concern I have with this functionality is: how should users silence a lint diagnostic that's target sensitive? e.g., a diagnostic that triggers based on the underlying type of size_t or the signedness of plain char. In that case, the diagnostic may trigger for some targets but not others, but on the targets where the diagnostic is not triggered, they now get a diagnostic they cannot silence. There should be a way to silence the "bad NOLINT" diagnostics.
> 
> There is such mechanism: it is possible to specify `// NOLINT(nolint-usage)` or `//NOLINT(check1, check2, nolint-usage) to silence the `nolint-usage`-mechanism. Please, see tests for details and more examples. 
Can you provide an example where this distinction will make a difference to the user and help clarify a confusing situation? I cannot think of one, and it would be nice to simplify this code.

Thank you for the explanation about nolint-usage. This is not terminology I've seen before -- is this your invention, or is it a documented feature for NOLINT comments?

================
Comment at: clang-tidy/ClangTidyDiagnosticConsumer.h:295
+  // separately to process NOLINT misuse.
+  std::unordered_map<unsigned, std::vector<unsigned>> DiagIDsToLineNumbers;
 };
----------------
xgsa wrote:
> aaron.ballman wrote:
> > Perhaps this would make more sense as an `llvm::IndexedMap` or other non-STL datatype?
> I think, the map of vectors fits well: it provides efficient adding, storing and iterating through elements. In spite items in vector can be duplicated, it's not a usual case (when there are a lot of diagnostics for the same check on the same line), so I don't think it worth using set. I don't see how `llvm::IndexedMap` could help here. What are the concerns about the current data structure?
STL data structures are not particularly well-optimized; they're meant for general usage cases. This map can get very large in real projects, so I worry about performance concerns. That's why we have specialized ADTs -- you should generally prefer them to STL containers.

https://reviews.llvm.org/D41326