[PATCH] D69560: [clang-tidy] Add 'experimental-cppcoreguidelines-avoid-adjacent-arguments-of-the-same-type' check

Thu Sep 24 06:22:51 PDT 2020

whisperity added a comment.

I'd like to resurrect the discussion about this. Unfortunately, the concept of `experimental-` group (in D76545 <https://reviews.llvm.org/D76545>) has fallen silent, too. We will present the results of this analysis soon at IEEE SCAM (Source Code Analysis and Manipulation) 2020 <http://ieee-scam.org/2020>. While a previous submission about this topic was rejected on the grounds of not being in line with the conference's usual, hyper-formalised nature, with one reviewer //especially// praising the paper for its nice touch with the empirical world and the fact that neither argument swaps (another checker worked on by a colleague) nor the interactions of this issue with the type system (this checker) was really investigated in the literature for C++ before, we've tailored both the paper and the implementation based on reviewer comments from the scientific world, and the comments here.
The reviews were mostly favourable, excl. comments about the lack of formalisation and the unfortunate lack of modelling for template instantiations. Such a cold cut, however, //only// introduces false negatives into the result set. Which is fine, as the users will never see those non-existent reports!

In D69560#1936574 <https://reviews.llvm.org/D69560#1936574>, @aaron.ballman wrote:

> This is a check that is almost impossible to comply with when developing new code because it is insufficiently clear on what constitutes "related" parameters.

I agree the CppCG is rather vague about this. From an angle, it seems intentional... "relatedness" is hard to pin down formally, it might vary from one "module" to the next even in the same project. In a subsequent patch to this, I've given a few good examples as to what can be reasonably culled by relatedness heuristics. They filter a good chunk of the results, letting go of most of the trivially "valid (but maybe nonsense) if swapped" functions (min, max, find, etc.)

In D69560#1936574 <https://reviews.llvm.org/D69560#1936574>, @aaron.ballman wrote:

> Two adjacent types in a function parameter list is an incredibly common situation that arises naturally when writing code [...] and designing around that from scratch is not always possible.

I see the reasoning behind this. However, this feels more like a motivation //against// the rule itself, and not the checker. We can debate the justification for the rule's existence, but (and correct me if I'm wrong) so far I have not seen any tool (that's publicly available, and is for C(++), etc...) that attempts to check your code satisfying this particular rule.

In D69560#1936574 <https://reviews.llvm.org/D69560#1936574>, @aaron.ballman wrote:

> (especially for constructors, where your parameter order has to match the declaration order within the class)

CMIIW, but I believe this statement is not true. Aggregate initialisation does not care about your constructors, and yes, takes arguments in the order of fields. However, once you have a user-defined constructor, you should be able to define your parameters in any order you like. The only thing you //should// match is that the //member-init-exprs// of the constructor are evaluated in order of field declarations, not in the order you wrote them. But trespassing that rule already emits a compiler warning, and in reality, the trespass only causes an issue if there is a data dependency between your fields. You //should// ensure your //member-init-exprs// are in the right order (to guard that a later change introducing a data dependency won't blow you up), but this has zero effect on the order of parameters of the constructor.

In D69560#1936574 <https://reviews.llvm.org/D69560#1936574>, @aaron.ballman wrote:

> Retroactively applying this rule to an existing code base would be nearly impossible for a project of any size.

It is hard in the general case (although there are some approaches and foundations that we could build upon with refactoring, I've taken up the mantle of spending some time with this research), but there are cases where "modernisation" efforts, especially tool-aided ones are already possible. I have been recently suggested reading this paper: //H. K. Wright: Incremental Type Migration Using Type Algebra <http://research.google/pubs/pub49386/>// While I agree that the case study in the paper is rather domain-specific if we consider the example there: once a tool has identified a variable/expression to not be a `double` but instead `abseil::Duration`, from a function call where such a refactored argument is passed, you can (or, hopefully, need to) replace the type of the parameter. And one such "adjacent parameters of the same type" has been refactored.

My position with the paper, and this implementation, is that it is more useful for new projects than existing ones. It might definitely not be useful //for LLVM// (a humongous project!), and it might not be possible to fix //every// report in an afternoon. But thanks to the vast amount of IDEs integrating Clang-Tidy, if someone wishes to adhere to this rule, they can have a (reasonably) immediate report the moment they trespassed the rule. You can also consider it in an incremental mode (see my comment above with incremental report gating via CodeChecker, but surely it's equally trivial to implement in other pipelines) or only in new or new-ish TUs/modules inside the project.

In addition, there are plenty of guidelines other than CppCG, which have at least some of their rules mapped to Tidy checks (and surely other analysers' routines). If a project (be it of any size and any domain) does not wish to adhere to a guideline (or a rule of a guideline), they can configure their invocation to not run those checks. I agree the fact that Tidy not supporting "opt-in" checkers by default is a bit of a drawback here, however, consider the "execution paths" one project might take, w.r.t. to this analysis:

//The checker is enabled, and it produces between one and a bazillion reports.//

1. They realise their project does not conform to CppCG (in which case why is the checker group enabled in the first place?) or specifically CppCG#I.24 and decide to disable the check. All is well in the world, and nothing happened excluding one new line being added to a configuration file.
2. Multiple projects big and famous enough see that their software are "non-compliant" (at least from the PoV of conformance to CppCG#I.24). They together start to question the necessity of such rule... maybe it shouldn't be a rule in the first place, because it's woefully stupid and impossible to adhere to. This is actionable data one can talk about at conferences and go to the guideline maintainers(?) with, and, if so, the rule gets deprecated/abolished/move to an "intellectual nitpicking" section. (In this case, implementing this check was a nice learning opportunity, and eventually, it will also be removed from Tidy.)
3. A //new// projects start developing, and they start seeing this rule. They can also decide whether or not conformance to the rule is something they want to live with. If not, previous cases apply. If so, we start to grow projects that are //somewhat// more type-safe, or less error-prone.

So far, we do not know how painful actual conformance to this rule will be. And, as with most static analysis, we might never know if there ever was "one life saved" by someone deciding to heed this warning. This won't be a panacea overnight, but I still believe it, first, does not have to be, and second, is a step in the right direction.

In D69560#1890767 <https://reviews.llvm.org/D69560#1890767>, @vingeldal wrote:

> I'm all for investigating the performance of this check and basing any decisions on data. Until we have that data I just want to challenge the claim that there are no suitable ways to turn the check of for select portions of the code base,

Computational performance is in line with Tidy checks in general. LLVM took 33 minutes to run, generally smaller projects are less than a minute. (`-j24`)
As for the performance in terms of the results... I have a virtual image (a previous conference we submitted the paper to wanted us to submit reproduction images, but because the paper was rejected, I believe they have not archived the image). I've found a CodeChecker database with 15 projects analysed (incl. LLVM) but it's ~150 MiB in size... 😦

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69560/new/

https://reviews.llvm.org/D69560