[PATCH] D29151: [clang-tidy] Add misc-invalidated-iterators check.

Sat Feb 4 06:40:20 PST 2017

Prazek added a comment.

In https://reviews.llvm.org/D29151#665948, @alexfh wrote:

> In https://reviews.llvm.org/D29151#662504, @zaks.anna wrote:
>
> > Before clang-tidy came into existence the guidelines were very clear. One should write a clang warning if the diagnostic:
> >
> > - can be implemented without path-insensitive analysis (including flow-sensitive)
> > - is checking for language rules (not specific API misuse)
> >
> >   In my view, this should still be the rule going forward because compiler warnings are most effective in reaching users.
> >
> >   The Clang Static Analyzer used to be the place for all other diagnostics. Most of the checkers it contains rely on path-sensitive analysis. Note that one might catch the same bug with flow-sensitive as well as path-sensitive analysis. So in some of those cases, we have both warnings as well as analyzer checkers finding the same type of user error. However, the checkers can catch more bugs since they are path-sensitive. The analyzer also has several flow-sensitive/ AST matching checkers. Those checks could not have been written as warnings mainly because they check for APIs, which are not part of the language.
> >
> >   My understanding is that clang-tidy supports fixits, which the clang static analyzer currently does not support. However, there could be other benefits to placing not path-sensitive checks there as well. I am not sure. I've also heard opinions that it's more of a linter tool, so the user expectations could be different.
> >
> > > Even right now there are clang-tidy checks that finds subset of alpha checks, but probably having lower false positive rate.
> >
> > Again, alpha checks are unfinished work, so we should not use them as examples of checkers that have false positives. Some of them are research projects and should probably be deleted.
>
>
> I tried to come up with some advice on where checks should go in http://clang.llvm.org/extra/clang-tidy/#choosing-the-right-place-for-your-check:
>
> | Clang diagnostic: if the check is generic enough, targets code patterns that most probably are bugs (rather than style or readability issues), can be implemented effectively and with extremely low false positive rate, it may make a good Clang diagnostic. |
> | Clang static analyzer check: if the check requires some sort of control flow analysis, it should probably be implemented as a static analyzer check.                                                                                                           |
> | clang-tidy check is a good choice for linter-style checks, checks that are related to a certain coding style, checks that address code readability, etc.                                                                                                       |
>
> There's no doubt path-sensitive checks should go to Static Analyzer, since it provides all the infrastructure for path-sensitive analyses. Whatever meets the criteria for a Clang diagnostic should be a diagnostic. Whatever needs automated fixes (and can be implemented on AST or preprocessor level) should go to clang-tidy.
>
> But there's still a large set of analyses that don't clearly fall into one of the categories above and can be implemented both in Clang Static Analyzer and in clang-tidy. Currently there are no firm rules about those, only recommendations on the clang-tidy page (quoted above). We might need to agree upon some reasonable guidelines, though.
>
> For example, IMO, AST-based analyses make more sense in clang-tidy for a few reasons:
>
> 1. They usually are easier expressible in terms of AST matchers, and clang-tidy allows to use AST matchers with less boilerplate.
> 2. Clang Static Analyzer is linked into clang, where AST matchers are undesired, since they tend to blow the size of binary significantly.
> 3. It's more consistent to keep all similar analyses together, it simplifies search for already implemented similar functionality and code reviews.
>
>   Flow-sensitive analyses (that don't need any path-sensitivity) seem to be equally suitable for SA and clang-tidy (unless I'm missing something), so I don't feel strongly as to where they should go.
>
>   WDYT?

This sounds resonable. I think there are also other factors like:

- heuristics: clang-tidy tends to look for bugs based on some heuristics. E.g use-after-move consider methods as reinitializing based on method names (like clear).

This is approach works great if the heuristic is good enough to not have many false positives, but generally it will have some false positives and false negatives.

- flexibility: The other problem is that check can be flexible, for example to work on every vector-like container. I think static analyzer tends to be more conservative about bugs it wants to find. E.g we know how std::vector works, so the check will find only bugs associated with it trying to not introduce false positives

I will try to think about other factors, but firstly let me give a couple of examples. Please tell me where would you put these checks and exactly why:

use-after-move:

- does complex walk on CFG similar to path analysis
- has heuristics what is reinitialization

this check:

- uses the same utilities as use-after-move
- will works for different types of containers like std::unordered_map or llvm::DenseMap, which indicates heuristics

check null after dereference, looks for code like:

  int *p = foo();
  *p = 42;
  if (p) {;;;}
  //of 
  if (!p) {;;;}

Some checks from static analyzer will find the body of if (!p) as dead code, and could probably find if (p) as expression always true.
I am not sure how it would handle cases like

  int *p = foo();
  *p = 42;
  if (b) {
    p = otherFoo();
  }

  if (!p) {;;;} 

Should static analyzer flag use like this? 
If we want to have low false positive rate, then probably not.
On the other hand I would love if clang-tidy would flag this, because it seems like a buggy code (if (!p) {;;;} could be hoisted to the other if)

Another example, fiding out of range:

  int glob[10];

  int sum = 0;
  for (int i = 0; i <= 10; i++) {
    sum += glob[i];
  }

So this sounds like it a good diagnostic for clang, and even gcc finds it, but only with -O2.
Of course it won't flag cases like:

  for (int i = 0; i <= 10; i++) {
    if (glob[i] == v)
      return true;
  }

Because the loop might end before going to element pass array. I am not sure how well we can model 
this with static analyzer, and if it is even possible to write check that would warn about cases like this, that are bugs (we can optimize it to single return true).

Repository:
  rL LLVM

https://reviews.llvm.org/D29151