[PATCH] D41326: [clang-tidy] Added diagnostics about incorrect usage of NOLINT comment

Mon Dec 25 06:46:52 PST 2017

xgsa marked 5 inline comments as done.
xgsa added inline comments.

================
Comment at: clang-tidy/ClangTidyDiagnosticConsumer.cpp:276
+
+  using NolintMap = std::unordered_map<NolintTargetInfo,
+                                       NolintCommentInfo,
----------------
aaron.ballman wrote:
> xgsa wrote:
> > aaron.ballman wrote:
> > > Is there a better LLVM ADT to use here?
> > This data structure provides the fast lookup by check name+line number and it's exactly what is necessary. What are the concerns about this data structure?
> Same as above.
I have reviewed llvm guide [1] and found that it recommends using ordered vector in such cases. I have implemented this approach, however I'd like to notice that lookup complexity in `reportRedundantNolintComments()` increased from average O(1) for `std::unordered_map` to O(log(N)) for binary search. However, memory usage become less. As this collection is gathered for each translation unit, I don't expect millions of `NOLINT` comments in it, thus this approach looks suitable.

[1] - http://llvm.org/docs/ProgrammersManual.html#picking-the-right-data-structure-for-a-task

================
Comment at: clang-tidy/ClangTidyDiagnosticConsumer.cpp:424
+// This method is not const, because it fills cache on first demand.
+// It is possible to fill cache in constructor or make cache volatile
+// and make this method const, but they seem like worse solutions.
----------------
aaron.ballman wrote:
> xgsa wrote:
> > aaron.ballman wrote:
> > > Making the cache volatile will have no impact on this.
> > > 
> > > Any reason not to make the cache `mutable`, however? That's quite common for implementation details.
> > Sorry, certainly, instead of "volatile" I meant "mutable".
> > 
> > Actually, using of "mutable" violates a constancy contract, as the field is get modified in a const method. Thus I'd tend to avoid using `mutable`, if possible, because e.g. in multi-threaded applications these fields require additional protection/synchronization. Moreover, I see that using of  `mutable` is not very spread in clang-tidy. Thus as, currently, `hasCheck` is called from the non-constant context, I'd prefer leaving it non-const instead of making cache `mutable`. Please, let me know if you insist on the `mutable` option.
> Use of mutable does not violate constancy; the cache is not exposed via any interface; it is purely an implementation detail. Very little of our code base is concerned with multithreaded scenarios (we use bit-fields *everywhere*, for instance).
> 
> I won't insist on using `mutable` if you are set against it, but this is the exact scenario in which it is the correct solution.
I've just tried the `mutalbe`-approach and discovered one more issue: `ClangTidyASTConsumerFactory` requires non constant `ClangTidyContext`. Thus, if current approach is not suitable, either `ClangTidyASTConsumerFactory` should be refactored on const and nonconst parts or caching should be done in constructor (which makes  fetching the names not lazy). Because of this, currently I am suggesting to leave `hasCheck` nonconstant.

================
Comment at: clang-tidy/ClangTidyDiagnosticConsumer.cpp:839-840
+        case NolintCommentType::Nolint:
+          Message = "there is no diagnostics on this line, "
+                    "the NOLINT comment is redundant";
+          break;
----------------
aaron.ballman wrote:
> xgsa wrote:
> > aaron.ballman wrote:
> > > I don't think the user is going to care about the distinction between no diagnostics being triggered and the expected diagnostic not being triggered. Also, it's dangerous to claim the lint comment is redundant because it's possible the user has NOLINT(foo, bar) and while foo is not triggered, bar still is. The NOLINT comment itself isn't redundant, it's that the check specified doesn't occur.
> > > 
> > > I would consolidate those scenarios into a single diagnostic: "expected diagnostic '%0' not generated" and "expected diagnostic '%0' not generated for the following line".
> > > 
> > > One concern I have with this functionality is: how should users silence a lint diagnostic that's target sensitive? e.g., a diagnostic that triggers based on the underlying type of size_t or the signedness of plain char. In that case, the diagnostic may trigger for some targets but not others, but on the targets where the diagnostic is not triggered, they now get a diagnostic they cannot silence. There should be a way to silence the "bad NOLINT" diagnostics.
> > > I don't think the user is going to care about the distinction between no diagnostics being triggered and the expected diagnostic not being triggered. Also, it's dangerous to claim the lint comment is redundant because it's possible the user has NOLINT(foo, bar) and while foo is not triggered, bar still is. The NOLINT comment itself isn't redundant, it's that the check specified doesn't occur.
> > > 
> > > I would consolidate those scenarios into a single diagnostic: "expected diagnostic '%0' not generated" and "expected diagnostic '%0' not generated for the following line".
> > 
> > This branch of `if (NolintEntry.first.CheckName == NolintCommentsCollector::AnyCheck)` reports only about `NOLINT`/`NOLINTNEXTLINE` comments without check list, so I suppose it's fair to claim that this comment is redundant (we have already checked that no single check reported diagnostics on the line). The `else`-branch reports the diagnostics for the definite check in a list in case of `NOLINT(foo, bar)` (actually, if neither `foo` nor `bar` checks reported diagnostics for the line, there will be a few diagnostics from `nolint-usage` - not sure if it's good, but it seems acceptable). That is why, I suppose, it is necessary to distinct these cases.
> > 
> > > One concern I have with this functionality is: how should users silence a lint diagnostic that's target sensitive? e.g., a diagnostic that triggers based on the underlying type of size_t or the signedness of plain char. In that case, the diagnostic may trigger for some targets but not others, but on the targets where the diagnostic is not triggered, they now get a diagnostic they cannot silence. There should be a way to silence the "bad NOLINT" diagnostics.
> > 
> > There is such mechanism: it is possible to specify `// NOLINT(nolint-usage)` or `//NOLINT(check1, check2, nolint-usage) to silence the `nolint-usage`-mechanism. Please, see tests for details and more examples. 
> Can you provide an example where this distinction will make a difference to the user and help clarify a confusing situation? I cannot think of one, and it would be nice to simplify this code.
> 
> Thank you for the explanation about nolint-usage. This is not terminology I've seen before -- is this your invention, or is it a documented feature for NOLINT comments?
>> Can you provide an example where this distinction will make a difference to the user and help clarify a confusing situation? I cannot think of one, and it would be nice to simplify this code.

Example for the diagnostics emitted in the `if`-branch:
```
class A2 { explicit A2(int i); }; // NOLINT
=> warning: there is no diagnostics on this line, the NOLINT comment is redundant
```
Thus, the whole NOLINT comment should be removed.

Example for the diagnostics emitted in the `else`-branch:
```
// Case: NO_LINT for the specific check on line with an error on another check.
class A4 { A4(int i); }; // NOLINT(misc-unused-parameters, google-explicit-constructor)
=> warning: there is no diagnostics for the 'misc-unused-parameters' check on this line, the NOLINT comment is redundant
```
In this case, only misc-unused-parameters check should be removed from the list, but not the whole NOLINT.

Note also that if in the last example there is only misc-unused-parameters in the NOLINT checks list, the diagnostics will be the same. However, I suppose it's acceptable, because even if a user removed only a check name from the check list without removing NOLINT itself, clang-tidy will suggest removing NOLINT itself on the next run.

>> Thank you for the explanation about nolint-usage. This is not terminology I've seen before -- is this your invention, or is it a documented feature for NOLINT comments?

It's my invention, I suppose, from the user point of view it's logical to consider this mechanism as a check.

================
Comment at: clang-tidy/ClangTidyDiagnosticConsumer.h:295
+  // separately to process NOLINT misuse.
+  std::unordered_map<unsigned, std::vector<unsigned>> DiagIDsToLineNumbers;
 };
----------------
aaron.ballman wrote:
> xgsa wrote:
> > aaron.ballman wrote:
> > > Perhaps this would make more sense as an `llvm::IndexedMap` or other non-STL datatype?
> > I think, the map of vectors fits well: it provides efficient adding, storing and iterating through elements. In spite items in vector can be duplicated, it's not a usual case (when there are a lot of diagnostics for the same check on the same line), so I don't think it worth using set. I don't see how `llvm::IndexedMap` could help here. What are the concerns about the current data structure?
> STL data structures are not particularly well-optimized; they're meant for general usage cases. This map can get very large in real projects, so I worry about performance concerns. That's why we have specialized ADTs -- you should generally prefer them to STL containers.
I have reviewed llvm guide [1] and haven't found more suitable ADT, except `std::map`, which provides O(log(N)) complexity for lookup and insert against average O(1) for `std::unordered_map`. `llvm::IndexedMap` doesn't look suitable, as there could be a lot of lines in the file, whereas only a few will contain NOLINT comment, so `llvm::IndexedMap` won't store data efficiently.

Moreover, as far as I understand, this collection stores data for the translation unit and is cleaned up after it is processed, so there shouldn't be millions of diagnostics line numbers stored here.

Thus, I'd prefer leaving the container type as is to optimize for performance, but I would be also OK with `std::map` to optimize for memory usage.

[1] - http://llvm.org/docs/ProgrammersManual.html#picking-the-right-data-structure-for-a-task

https://reviews.llvm.org/D41326