[PATCH] D20689: [clang-tidy] Add 'readability-suspicious-call-argument' check

Thu Mar 18 05:40:55 PDT 2021

aaron.ballman added inline comments.

================
Comment at: clang-tools-extra/clang-tidy/readability/SuspiciousCallArgumentCheck.cpp:51
+        {true, 40, 50}, // Substring.
+        {true, 50, 66}, // Levenshtein.
+        {true, 75, 85}, // Jaro-Winkler.
----------------
varjujan wrote:
> whisperity wrote:
> > aaron.ballman wrote:
> > > We should probably document where all these numbers come from, but `66` definitely jumps out at me as being a bit strange. :-D
> > Unfortunately, I have absolutely no idea. All these values are percentages between 0 and 100 (`-1` is just saying that //"This heuristic doesn't accept percentages"//), and this is written in the documentation now. However, the answer to //"**why** 66%?"//, unless @varjujan can say something, I think is lost to history...
> > 
> > I'll read over his thesis once again, maybe I can find anything with regards to this.
> > 
> > Either way, I've detailed from both the code and the thesis how the percentages are meant. In some cases, the % is calculated as //"% of the longer string's length"//. In the Leventhstein's case, it's actually inverted:
> > 
> > ```
> > Dist = (1 - Dist / LongerLength) * 100;
> > ```
> > 
> > So what this says is that if the current arg1-param1 arg2-param2 pairing has less than the inverse of 50% (which is more than 50%) of the longer string's edit distance, but the arg2-param1 and arg1-param2 (the suggested swapped order) has more than the inverse of 66% (which is less than 33%), then the swap will be suggested.
> > 
> > Originally these values were called `LowerBound` and `UpperBound`, respectively, which was saying **even less** about what they mean...
> Sadly, I think there isn't any scientific reason behind these numbers. They just looked ok after a couple of test runs. (Maybe they make sense for shorter arg names, like the 66 for 3 char long names.)
Thank you for the explanations! I'm fine with the values as they are (they're defaults that can be changed anyway).

================
Comment at: clang-tools-extra/clang-tidy/readability/SuspiciousCallArgumentCheck.cpp:164
+static bool applyLevenshteinHeuristic(StringRef Arg, StringRef Param,
+                                      std::size_t Threshold) {
+  std::size_t LongerLength = std::max(Arg.size(), Param.size());
----------------

================
Comment at: clang-tools-extra/clang-tidy/readability/SuspiciousCallArgumentCheck.cpp:347
+// Checks whether ArgType converts implicitly to ParamType.
+static bool areTypesCompatible(QualType ArgType, QualType ParamType,
+                               const ASTContext &Ctx) {
----------------
whisperity wrote:
> whisperity wrote:
> > aaron.ballman wrote:
> > > It seems like we're doing an awful lot of the same work as `ASTContext::typesAreCompatible()` and type compatibility rules are pretty complex, so I worry about this implementation being different than the `ASTContext` implementation. Have you explored whether we can reuse more of the logic from `ASTContext` here, or are they doing fundamentally different kinds of type compatibility checks?
> > No, I didn't know that function even existed. This check must be older than that function.
> Actually, no, that function is pretty old... However, that function, and all the function it subsequently calls, require a **non-const** `ASTContext`. I have changed `ASTContext`:
> 
> ```
> ╰─ git diff --cached --stat
>  clang/include/clang/AST/ASTContext.h                                     | 67 ++++++++++++++++++++++++++++++++++++-------------------------------
>  clang/lib/AST/ASTContext.cpp                                             | 72 +++++++++++++++++++++++++++++++++++++-----------------------------------
>  2 files changed, 73 insertions(+), 66 deletions(-)
> ```
> 
> making related member functions and internal static functions take `const ASTContext &`/`*`.
> 
> This, by itself, did not break any of the tests of `check-clang check-clang-unit check-clang-tools check-clang-extra-unit`!
Doubtful -- `typesAreCompatible()` is critical for checking the semantics of assignment in C, overloading in C++, etc. It may have simply been overlooked when writing this check. Given the complexities of type checking, my intuition is that we should be leaning on `ASTContext` for as much of this functionality as we can get away with. That will also get us nice "extras" like caring about address spaces, ARC, etc which are what got me worried when I started looking at this implementation.

================
Comment at: clang-tools-extra/clang-tidy/readability/SuspiciousCallArgumentCheck.cpp:624
+      diag(MatchedCallExpr->getExprLoc(),
+           "%ordinal0 argument '%1' (passed to '%2') looks like it might be "
+           "swapped with the %ordinal3, '%4' (passed to '%5')")
----------------
TIL about `%ordinal` in diagnostics, thanks for that! :-D

================
Comment at: clang-tools-extra/clang-tidy/readability/SuspiciousCallArgumentCheck.cpp:632-634
+      diag(CalleeFuncDecl->getLocation(), "in the call to '%0', declared here",
+           DiagnosticIDs::Note)
+          << CalleeFuncDecl->getNameInfo().getName().getAsString()
----------------

================
Comment at: clang-tools-extra/clang-tidy/readability/SuspiciousCallArgumentCheck.cpp:644
+  // Reset vectors, and fill them with the currently checked function's
+  // attributes.
+  ParamNames.clear();
----------------

================
Comment at: clang-tools-extra/clang-tidy/readability/SuspiciousCallArgumentCheck.cpp:652-656
+      if (IdentifierInfo *II = Param->getIdentifier()) {
+        ParamNames.push_back(II->getName());
+      } else {
+        ParamNames.push_back(StringRef());
+      }
----------------

================
Comment at: clang-tools-extra/clang-tidy/readability/SuspiciousCallArgumentCheck.cpp:657-660
+    } else {
+      ParamTypes.push_back(QualType());
+      ParamNames.push_back(StringRef());
+    }
----------------
Can this case happen?

================
Comment at: clang-tools-extra/clang-tidy/readability/SuspiciousCallArgumentCheck.cpp:707-708
+  // Do not report for too short strings.
+  if (ArgNames[Position1].size() < 3 || ArgNames[Position2].size() < 3 ||
+      ParamNames[Position1].size() < 3 || ParamNames[Position2].size() < 3)
+    return false;
----------------
Should the length of what's considered "too short" be a configuration option? I think 3 is a good default.

================
Comment at: clang-tools-extra/clang-tidy/readability/SuspiciousCallArgumentCheck.cpp:739
+  for (Heuristic Heur : AppliedHeuristics) {
+    unsigned char Threshold;
+    if (Optional<char> GotBound = getBound(Heur, Bound))
----------------
aaron.ballman wrote:
> There's some type confusion going on here between `unsigned char` and `char` -- I think all of these uses should switch to `uint8_t` or `int8_t` (or possibly just `int` given that these values all wind up being promoted to `int` anyway).
Otherwise it looks like we could read an uninitalized value later (if `GotBound` was `None`).

================
Comment at: clang-tools-extra/clang-tidy/readability/SuspiciousCallArgumentCheck.cpp:739-741
+    unsigned char Threshold;
+    if (Optional<char> GotBound = getBound(Heur, Bound))
+      Threshold = GotBound.getValue();
----------------
There's some type confusion going on here between `unsigned char` and `char` -- I think all of these uses should switch to `uint8_t` or `int8_t` (or possibly just `int` given that these values all wind up being promoted to `int` anyway).

================
Comment at: clang-tools-extra/clang-tidy/readability/SuspiciousCallArgumentCheck.cpp:743-744
+
+    if (AreNamesSimilar)
+      break;
+
----------------
Any reason not to move this below the `switch` and use `=` instead of `|=` within the cases? (Or return from the cases directly?)

================
Comment at: clang-tools-extra/docs/clang-tidy/checks/readability-suspicious-call-argument.rst:61-68
+The *prefix* heuristic reports if one of the strings is a sufficiently long
+prefix of the other string, e.g. ``target`` to ``targetPtr``.
+The similarity percentage is the length ratio of the prefix to the longer
+string, in the previous example, it would be `6 / 9 = 66.66...`\%.
+
+This heuristic can be configured with :ref:`bounds<opt_Bounds>`.
+The default bounds are: below `25`\% dissimilar and above `30`\% similar.
----------------
I wonder how Hungarian notation impacts this heuristic -- I would imagine a lot of similar prefixes in such a code base, and things like `lpstr` as a prefix could be a pretty large chunk of some identifiers.

================
Comment at: clang-tools-extra/docs/clang-tidy/checks/readability-suspicious-call-argument.rst:75
+In the case of ``oldValue`` and ``value`` compared, the similarity percentage
+is `8 / 5 = 62.5`\%.
+
----------------
Similar to above, I wonder how numeric digits impact this heuristic -- do the defaults consider this to be a swap?
```
void foo(int frobble1, int frobble2);
foo(frobble2, frobble1); // Hopefully identified as a swap
foo(bar2, bar1); // How about this?
```

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D20689/new/

https://reviews.llvm.org/D20689