[PATCH] D69560: [clang-tidy] Add 'bugprone-easily-swappable-parameters' check

Aaron Ballman via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Feb 2 06:42:39 PST 2021


aaron.ballman added inline comments.


================
Comment at: clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp:58
+
+namespace {
+
----------------
whisperity wrote:
> aaron.ballman wrote:
> > Is there a need for the anonymous namespace? (We typically only use them when defining a class and try to make them as narrow in scope as possible, and use `static` for function declarations instead.)
> There will be two major parts, the modelling (and which is extended down the line with new models like implicit conversions) and the filtering (which throws away or breaks up results). Should I cut out the anonymous namespace? How about the inner namespaces, may they stay?
The inner namespaces are totally fine. As for the anon namespace, I personally don't have a strong opinion against it, I brought it up mostly as a coding style guide nit, but I think removing it is helpful so that it's more obvious when a function has internal vs external linkage from its signature.


================
Comment at: clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp:67
+/// The language features involved in allowing the mix between two parameters.
+enum MixFlags : unsigned char {
+  MIX_Invalid = 0, //< Sentinel bit pattern. DO NOT USE!
----------------
whisperity wrote:
> aaron.ballman wrote:
> > Does using the fixed underlying type buy us something? As best I can tell, this will introduce a fair amount of implicit promotions to `int` anyway.
> Will it? I thought if I use the proper LLVM bitmask/flag enum thing it will work properly. For now, it is good for the bit flag debug print, because it only prints 8 numbers, not 32.
Ah, perhaps that's the case -- I'm not super familiar with the LLVM bitmask/flag macros. That said, if you think there's good utility for it, I'm happy to leave it as-is.


================
Comment at: clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp:90-91
+
+  void sanitise() {
+    assert(Flags != MIX_Invalid && "sanitise() called on invalid bitvec");
+    // TODO: There will be statements here in further extensions of the check.
----------------
whisperity wrote:
> aaron.ballman wrote:
> > Heh, I don't know if we have any guidance on US English vs British English spellings. ;-)
> Reflexes, my bad. (Similarly how I put the pointer `*` to the left side, where Stroustrup intended. At least my local pre-commit hooks take care of that.) Will try to remember this.
Heh, no worries, I'm betting if we did some code searches, we'd find plenty of British English spellings as well. :-D


================
Comment at: clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp:255
+  std::string NodeTypeName =
+      Node->getType().getAsString(Node->getASTContext().getPrintingPolicy());
+  StringRef CaptureName{NodeTypeName};
----------------
whisperity wrote:
> aaron.ballman wrote:
> > Hmm, one downside to using the printing policy to get the node name is that there can be alternative spellings for the same type that the user might reasonably expect to be applied. e.g., if the user expects to ignore `bool` datatypes and the printing policy wants to spell the type as `_Bool`, this won't behave as expected.
> But then the diagnostic is inconsistent with what the compiler is configured to output as diagnostics, isn't it? Can I stringify through Clang with some "default" printing policy?
The situation I'm worried about is something like this in C code:
```
#include <stdbool.h>

void func(bool b, bool c) {}
```
because the `bool` in the function signature will expand to `_Bool` after the preprocessor gets to it (because of the contents of `<stdbool.h>`). If the user writes `bool` in their configuration file because that's what they've written in their function signatures, will this fail because the printing policy says it should be `_Bool`?

Along similar lines, I wonder about things like `struct S` vs `S` (in C), `signed char` vs `char`, types within anonymous namespaces (where we print `<anonymous namespace>`), etc. Basically, any time when the printing policy may differ from what the user lexically writes in code.


================
Comment at: clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp:302
+                         "FowardIt", "bidirit", "BidirIt", "constiterator",
+                         "const_iterator", "Const_Iterator", "Constiterator",
+                         "ConstIterator"});
----------------
whisperity wrote:
> aaron.ballman wrote:
> > `reverse_iterator` and `reverse_const_iterator` too?
> > 
> > How about ranges?
> > How about ranges?
> 
> I would like to say that having an `f(RangeTy, RangeTy)` is //exactly// as problematic (especially if both are non-const) as having `f(int, int)` or `f(void*, void*)`, and should be taken care of the same way (relatedness heuristic, name heuristic).
> The idea behind ignoring iterator-ly named things was that //"Iterators more often than not comes in pairs and you can't(*) do anything about it"//.
> 
> (*): Use ranges. Similarly how `draw(int x, int y)` is `draw(Point2D)` if we are enhancing type safety.
I was thinking about ranges in situations where you also need them to come in pairs, like range union, range intersection, range difference, etc. Those cases feel like they suffer from the same problem as `swap()` where the interface is intentionally operating on two of the same types.

That said, I think paired range operations are *way* less common than paired iterator operations, so I think ranges should probably be left out.


================
Comment at: clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.cpp:402-403
+
+    // Unfortunately, undersquiggly highlights are for some reason chewed up
+    // and not respected by diagnostics from Clang-Tidy. :(
+    diag(First->getLocation(), "the first parameter in the range is '%0'",
----------------
whisperity wrote:
> aaron.ballman wrote:
> > Can you post a screenshot of what you mean?
> Given
> 
> ```
> Diag << SourceRange{First->getSourceRange().getBegin(), Last->getSourceRange().getEnd()};
> ```
> 
> The diagnostic is still produced with only a `^` at the original `diag(Location)`, ignoring the fed `SourceRange`:
> 
> ```
> /home/whisperity/LLVM/Build/../llvm-project/clang-tools-extra/test/clang-tidy/checkers/bugprone-easily-swappable-parameters-len2.cpp:21:18: warning: 3 adjacent parameters of 'redeclChain' of similar type ('int') are easily swapped by mistake [bugprone-easily-swappable-parameters]
> void redeclChain(int I, int J, int K) {}
>                  ^
> ```
> 
> Instead of putting all the squigglies in as the range-location highlight, like how `Sema` can diagnose:
> 
> ```
> /home/whisperity/LLVM/Build/../llvm-project/clang-tools-extra/test/clang-tidy/checkers/bugprone-easily-swappable-parameters-len2.cpp:21:18: warning: 3 adjacent parameters of 'redeclChain' of similar type ('int') are easily swapped by mistake [bugprone-easily-swappable-parameters]
> void redeclChain(int I, int J, int K) {}
>                  ^~~~~~~~~~~~~~~~~~~
> ```
> 
> I would not want to put additional note tags in an otherwise already verbose output.
> 
> Back in 2019 I was investigating this issue specifically for another checker I was working on, but hit the wall... Somewhere deep inside where Tidy diagnostic stuff is translated and consumed to Clang stuff, it goes away. It shouldn't, because there are statements that seemingly copy the `Diag.Ranges` array over, but it still does not come out.
> 
> ... EDIT: I found [[ http://lists.llvm.org/pipermail/cfe-dev/2019-October/063676.html | the relevant mailing list post ]].
> ... EDIT: I found the relevant mailing list post.

Oh wow, I had no idea! That's a rather unfortunate bug. :-(


================
Comment at: clang-tools-extra/clang-tidy/bugprone/EasilySwappableParametersCheck.h:30
+  void storeOptions(ClangTidyOptions::OptionMap &Opts) override;
+
+  /// The minimum length of an adjacent swappable parameter range required for
----------------
whisperity wrote:
> aaron.ballman wrote:
> > Do these data members need to be public?
> Yes. The modelling heuristics will have a bunch of recursive functions which all need the config variables (okay, not the `MinimumLength` one, but the other 4 or 5 that will be introduced) and giving them every time by copy would introduce its own adjacent swappable `bool, bool` parameter sequence issue.
> 
> The previous full version of the check which I'm incrementally reworking has signatures like:
> 
> ```
> static MixupData HowPossibleToMixUpAtCallSite(const QualType LType,
>                                               const QualType RType,
>                                               const ASTContext &Ctx,
>                                               const bool CVRMixPossible,
>                                               const bool ImplicitConversion,
>                                               const bool IsUserDefinedConvertersOtherwisePossible);
> 
> static MixupData RefBindsToSameType(const LValueReferenceType *LRef,
>                                     const Type *T, const ASTContext &Ctx,
>                                     const bool IsRefRightType,
>                                     const bool CVRMixPossible,
>                                     const bool ImplicitConversionEnabled);
> ```
That's understandable logic, but then we're exposing essentially what amount to be implementation details. So it sounds like we're introducing a new code maintenance issue in order to solve another code maintenance issue. :-(

That said, I don't think leaving these as `public` is a deal-breaker.


================
Comment at: clang-tools-extra/test/clang-tidy/checkers/bugprone-easily-swappable-parameters-len2.cpp:10
+
+void declaration(int Param, int Other); // NO-WARN: No chance to change this function.
+
----------------
whisperity wrote:
> aaron.ballman wrote:
> > I think this is a case where we could warn when the declaration is outside of a system header (perhaps as an option).
> > 
> > Thinking about it a bit more, declarations and definitions provide a novel way to get another kind of swap:
> > ```
> > void func(int x, int y);
> > void func(int y, int x) {} // Oops, the swap was probably not intentional
> > ```
> > which may or may not be interesting for a check (or its heuristics).
> I gave this some thought. It is a very good idea, but I believe not for //this// check, but D20689. What do you think of that? Maybe simply saying "call site v. function node that was called" could be extended with a completely analogous, string distance function based "function definition node v. redecl chain" case.
Hmmm, my impression is that this check is fundamentally about *parameter* ordering whereas D20689 is about *argument* ordering. Given that the example is about the ordering of parameters and doesn't have anything to do with call sites, I kind of think the functionality would be better in a parameter-oriented check.

That said, it doesn't feel entirely natural to add it to this check because this check is about parameters that are easily swappable *at call sites*, and this situation is not exactly that. However, it could probably be argued that it is appropriate to add it to this check given that swapped parameters between the declaration and the definition are likely to result in swapped arguments at call sites.

Don't feel obligated to add this functionality to this check (or invent a new check), though.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69560/new/

https://reviews.llvm.org/D69560



More information about the cfe-commits mailing list