[PATCH] D41077: [analyser] different.CallArgsOrder checker implementation

Tue Dec 19 07:41:32 PST 2017

alexfh added a comment.

In https://reviews.llvm.org/D41077#958819, @szepet wrote:

> 4. FYI: There is a similar check under review which uses only the AST provided information and implemented as a tidy-checker: https://reviews.llvm.org/D20689 (As I see your checker does not uses symbolic execution provided features. So, it would probably worth thinking about if the analyzer is the project where the checker should be implemented. However, @dcoughlin and @alexfh have more insight on the answer to this question. What do you think? )

The main problem with the check proposed in https://reviews.llvm.org/D20689 (which uses, IIUC, very similar algorithm to this one) is that the false positive rate on real projects is too high. The comment I made on the other review thread applies here as well:

| There's definitely potential in using parameter name and argument spelling to detect possibly swapped arguments, and there's a recent research on this topic, where authors claim to have reached the true positive rate of 85% with decent recall. See https://research.google.com/pubs/pub46317.html. If you're interested in working on this check, I would suggest at least looking at the techniques described in that paper. |
|

As for whether this analysis should be implemented in the static analyzer or as a clang-tidy check: I think, clang-tidy would be more suitable, since

1. this analysis doesn't need the path analysis machinery provided by the static analyzer
2. this analysis may benefit from suggesting automated fixes
3. the false positive rate of the state-of-the-art algorithm implementing similar analysis have a false positive rate of at least 15-30%, and a large fraction of false positives could  potentially be treated as "code smells" which can be fixed by giving function parameters or local variables proper names. I don't know whether this kind of analysis meets the bar the static analyzer sets for its checkers. Anna or Devin can tell more here.

> 5. In overall, please add comments to the function which describes its purpose. Mainly on heuristic functions, it can help to understand it more easily. Also, could you provide some results of the current heuristics, what are the false positive rates on real projects like LLVM, FFmpeg, etc? I am quite interested in that.

You can take the list of projects from https://reviews.llvm.org/D20689#878803, run the analysis on them, and see whether the results are any better.

Repository:
  rC Clang

https://reviews.llvm.org/D41077