[PATCH] D69560: [clang-tidy] Add 'experimental-cppcoreguidelines-avoid-adjacent-parameters-of-the-same-type' check

Thu Oct 8 06:58:53 PDT 2020

aaron.ballman added a comment.

In D69560#2292505 <https://reviews.llvm.org/D69560#2292505>, @whisperity wrote:

> I'd like to resurrect the discussion about this. Unfortunately, the concept of `experimental-` group (in D76545 <https://reviews.llvm.org/D76545>) has fallen silent, too. We will present the results of this analysis soon at IEEE SCAM (Source Code Analysis and Manipulation) 2020 <http://ieee-scam.org/2020>. While a previous submission about this topic was rejected on the grounds of not being in line with the conference's usual, hyper-formalised nature, with one reviewer //especially// praising the paper for its nice touch with the empirical world and the fact that neither argument swaps (another checker worked on by a colleague) nor the interactions of this issue with the type system (this checker) was really investigated in the literature for C++ before, we've tailored both the paper and the implementation based on reviewer comments from the scientific world, and the comments here.
> The reviews were mostly favourable, excl. comments about the lack of formalisation and the unfortunate lack of modelling for template instantiations. Such a cold cut, however, //only// introduces false negatives into the result set. Which is fine, as the users will never see those non-existent reports!

Congrats on the SCAM acceptance, I hope the presentation goes well!

> In D69560#1936574 <https://reviews.llvm.org/D69560#1936574>, @aaron.ballman wrote:
>
>> This is a check that is almost impossible to comply with when developing new code because it is insufficiently clear on what constitutes "related" parameters.
>
> I agree the CppCG is rather vague about this. From an angle, it seems intentional... "relatedness" is hard to pin down formally, it might vary from one "module" to the next even in the same project. In a subsequent patch to this, I've given a few good examples as to what can be reasonably culled by relatedness heuristics. They filter a good chunk of the results, letting go of most of the trivially "valid (but maybe nonsense) if swapped" functions (min, max, find, etc.)

I agree that the vagueness seems intentional. That's usually reasonable when you want coding guidelines for a human to check, but doesn't help for production-quality automated checking. Do you happen to know how many false positives the check currently issues without that heuristic?

> In D69560#1936574 <https://reviews.llvm.org/D69560#1936574>, @aaron.ballman wrote:
>
>> Two adjacent types in a function parameter list is an incredibly common situation that arises naturally when writing code [...] and designing around that from scratch is not always possible.
>
> I see the reasoning behind this. However, this feels more like a motivation //against// the rule itself, and not the checker.

Correct, it's definitely a motivation against the rule itself.

> We can debate the justification for the rule's existence, but (and correct me if I'm wrong) so far I have not seen any tool (that's publicly available, and is for C(++), etc...) that attempts to check your code satisfying this particular rule.

I'm not aware of any either. However, an indictment of the rule being too ambiguous is also an indictment of any checkers implementing that rule because they're going to have a higher failure rate than if the rule was not ambiguous.

> In D69560#1936574 <https://reviews.llvm.org/D69560#1936574>, @aaron.ballman wrote:
>
>> (especially for constructors, where your parameter order has to match the declaration order within the class)
>
> CMIIW, but I believe this statement is not true. Aggregate initialisation does not care about your constructors, and yes, takes arguments in the order of fields. However, once you have a user-defined constructor, you should be able to define your parameters in any order you like. The only thing you //should// match is that the //member-init-exprs// of the constructor are evaluated in order of field declarations, not in the order you wrote them.

My point is that it's far more common to match all three orders even though you don't have to. e.g.,

  // Common, correct
  class C {
    int a, b;
    float c;

  public:
    C(int a_, int b_, float c_) : a(a_), b(b_), c(c_) {}
  };

  // Usually held as a code smell, but correct and not commonly diagnosed
  class C {
    int a, b;
    float c;

  public:
    C(int a_, float c_, int b_) : a(a_), b(b_), c(c_) {}
  };

  // Incorrect, commonly diagnosed
  class C {
    int a, b;
    float c;

  public:
    C(int a_, float c_, int b_) : a(a_), c(c_), b(b_) {}
  };

> But trespassing that rule already emits a compiler warning, and in reality, the trespass only causes an issue if there is a data dependency between your fields. You //should// ensure your //member-init-exprs// are in the right order (to guard that a later change introducing a data dependency won't blow you up), but this has zero effect on the order of parameters of the constructor.

I don't fully agree that it has zero effect on the parameter order of the constructors, but more importantly, what do you do about this sort of thing as a user?

  class Point {
    int x, y;

  public:
    Point(int x, int y);

    friend bool operator==(const Point &LHS, const Point &RHS);
  }

  class Rect {
    int x, y, width, height;

  public:
    Rect(int x, int y, int width, int height);
  };

These situations are not ones we should ever be issuing a diagnostic for IMO, but I would argue that at least in the case of the constructors, they violate the spirit of the C++ guideline rule (where swaps are plausible and problematic). This is why I think the *rule* is bogus and I'm not certain we should have a checker for it -- the rule itself is too broad and doesn't account for realistic code.

> In D69560#1936574 <https://reviews.llvm.org/D69560#1936574>, @aaron.ballman wrote:
>
>> Retroactively applying this rule to an existing code base would be nearly impossible for a project of any size.
>
> It is hard in the general case (although there are some approaches and foundations that we could build upon with refactoring, I've taken up the mantle of spending some time with this research), but there are cases where "modernisation" efforts, especially tool-aided ones are already possible. I have been recently suggested reading this paper: //H. K. Wright: Incremental Type Migration Using Type Algebra <http://research.google/pubs/pub49386/>// While I agree that the case study in the paper is rather domain-specific if we consider the example there: once a tool has identified a variable/expression to not be a `double` but instead `abseil::Duration`, from a function call where such a refactored argument is passed, you can (or, hopefully, need to) replace the type of the parameter. And one such "adjacent parameters of the same type" has been refactored.
>
> My position with the paper, and this implementation, is that it is more useful for new projects than existing ones.

Agreed.

> It might definitely not be useful //for LLVM// (a humongous project!), and it might not be possible to fix //every// report in an afternoon. But thanks to the vast amount of IDEs integrating Clang-Tidy, if someone wishes to adhere to this rule, they can have a (reasonably) immediate report the moment they trespassed the rule. You can also consider it in an incremental mode (see my comment above with incremental report gating via CodeChecker, but surely it's equally trivial to implement in other pipelines) or only in new or new-ish TUs/modules inside the project.

I understand this point, but it doesn't address the fact that users writing real code to solve problems will *often* run into situations where this rule diagnoses code that cannot be modified. Overloaded operators are a great example of this -- how do you avoid adjacent parameters of the same type in an overloaded comparison operator? How does someone writing a `Point` or `Rect` class avoid adjacent parameters of the same type in constructors? How does someone avoid writing adjacent parameters in a callback function whose signature is defined by a standard (like `bsearch()` or `qsort()` from the C standard)?

I think the answer to some of these questions is that we can introduce heuristics -- exempt overloaded operators from the check, exempt functions named `equal`, `compare`, `comp`, etc, exempt constructors (perhaps only when all data members have compatible types), etc. But without those sort of heuristics, I'm not certain the check has a lot of value for real code.

> In addition, there are plenty of guidelines other than CppCG, which have at least some of their rules mapped to Tidy checks (and surely other analysers' routines). If a project (be it of any size and any domain) does not wish to adhere to a guideline (or a rule of a guideline), they can configure their invocation to not run those checks.

We do have other modules for other coding guidelines like the CERT guidelines, etc. And sometimes those guidelines have similar questionable behavior (for instance, CERT bans any use of calling the `system()` function -- so users are left without recourse there). That may be a reason to adopt this check, but we often ask that when check authors find an onerous behavior in a rule, they go back to the rule authors to alert them of the problems and see if we can improve the rule at the same time as the check. If the rule authors come back and say "no, we really do want it to diagnose in those cases", then so be it.

> I agree the fact that Tidy not supporting "opt-in" checkers by default is a bit of a drawback here, however, consider the "execution paths" one project might take, w.r.t. to this analysis:
>
> //The checker is enabled, and it produces between one and a bazillion reports.//
>
> 1. They realise their project does not conform to CppCG (in which case why is the checker group enabled in the first place?) or specifically CppCG#I.24 and decide to disable the check. All is well in the world, and nothing happened excluding one new line being added to a configuration file.
> 2. Multiple projects big and famous enough see that their software are "non-compliant" (at least from the PoV of conformance to CppCG#I.24). They together start to question the necessity of such rule... maybe it shouldn't be a rule in the first place, because it's woefully stupid and impossible to adhere to. This is actionable data one can talk about at conferences and go to the guideline maintainers(?) with, and, if so, the rule gets deprecated/abolished/move to an "intellectual nitpicking" section. (In this case, implementing this check was a nice learning opportunity, and eventually, it will also be removed from Tidy.)
> 3. A //new// projects start developing, and they start seeing this rule. They can also decide whether or not conformance to the rule is something they want to live with. If not, previous cases apply. If so, we start to grow projects that are //somewhat// more type-safe, or less error-prone.
>
> So far, we do not know how painful actual conformance to this rule will be. And, as with most static analysis, we might never know if there ever was "one life saved" by someone deciding to heed this warning. This won't be a panacea overnight, but I still believe it, first, does not have to be, and second, is a step in the right direction.

I agree the check doesn't have to be perfect, but given the triviality of finding a large number of false positives, it's worth our effort to see if we can find a solution that is more practical.

I think the concrete steps I'd like to see before moving forward with this check is:

- Pick 1-3 projects with whatever qualifications make sense (size, adherence to the core guidelines, etc) and measure how many diagnostics this check produces as-is and see if we can get an idea of what % of diagnostics are ones the user would struggle to do something with (where struggle is: can't change the order because of <reasons>, changing the order would make the code less readable, etc).
- See if we can come up with some heuristics to reduce the number of poor-quality diagnostics and measure the effectiveness against those projects.
- Once we have some ideas of ways to reduce the "noise" that we're happy with, circle back with the C++ Core Guideline authors to see if they'd be willing to tighten the rule up for those situations. If they *don't* want the changes, that's fine, we can hide them behind a configuration flag (and we can decide whether that flag should be on or off by default at that point).

WDYT?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69560/new/

https://reviews.llvm.org/D69560