[cfe-dev] [clang-tidy][RFC] Run each check only once

Aaron Ballman via cfe-dev cfe-dev at lists.llvm.org
Thu Dec 16 05:38:53 PST 2021


On Thu, Dec 16, 2021 at 8:29 AM Carlos Galvez <carlosgalvezp at gmail.com> wrote:
>
> Aaron,
>
> Did you have a chance to look at the patch? I have tried to reach you via the comments section but haven't got any reply yet.

Thank you for the ping, and sorry about the lack of response! I'll try
to get to it today.

~Aaron

>
> Best regards,
> Carlos
>
> On Sat, Nov 20, 2021 at 3:07 PM Carlos Galvez <carlosgalvezp at gmail.com> wrote:
>>
>> Here it goes! https://reviews.llvm.org/D114317
>>
>> As I wrote in the commit message, please take it with a grain of salt - the code is not pretty and many things are missing - I'll take care of that if/when we are happy with the general direction. Looking forward to your feedback! If you have suggestions for the missing bits I'd be very happy to hear them.
>>
>> /Carlos
>>
>> On Thu, Nov 18, 2021 at 1:46 PM Aaron Ballman <aaron at aaronballman.com> wrote:
>>>
>>> On Wed, Nov 17, 2021 at 7:36 AM Carlos Galvez <carlosgalvezp at gmail.com> wrote:
>>> >
>>> > Hi again!
>>> >
>>> > I've been looking into this for a while now and have some very crude proof-of-concept that seems to work without any impact for check developers and ensuring only "perfect aliases" are not run. The only things I can't get around are diagnostics and profiling, so I'd need some help to figure that out (or drop the requirements). Should I push this proof-of-concept for review and continue the discussion there, or do you prefer to discuss here in the mailing list?
>>>
>>> If you've got a proof of concept, I think it's probably worth showing
>>> at this point. (Just as an FYI for setting expectations, my review
>>> time is pretty limited for the next while.)
>>>
>>> ~Aaron
>>>
>>> >
>>> > Best regards,
>>> > /Carlos
>>> >
>>> >
>>> > On Mon, Oct 4, 2021 at 9:03 PM Aaron Ballman <aaron at aaronballman.com> wrote:
>>> >>
>>> >> On Mon, Oct 4, 2021 at 2:30 PM Carlos Galvez <carlosgalvezp at gmail.com> wrote:
>>> >> >
>>> >> > I think we agree in the goals of this change, perhaps it gets confusing when using the term "alias" with different meanings, so let me summarize what I believe is the requirements of what we want to have, without entering in technicalities:
>>> >> >
>>> >> > * All publicly documented checks should be "first-class checks", meaning users can be enable/disable them independently of each other. Not the scope of this change to deprecate the concept/name of "alias", perhaps in the future.
>>> >>
>>> >> +1. To be super clear, I consider checks which redirect their
>>> >> documentation to still count as "publicly documented checks".
>>> >>
>>> >> > * Users don't need to know/care how these checks are implemented. If they reuse code or not, that's not their concern. They just need to care that their code will get checked properly when the check is enabled, so that they can fix bugs, improve code quality or be compliant with a given guideline.
>>> >>
>>> >> +1
>>> >>
>>> >> > * Check developers should keep their exact same development flow, and be able to reuse code if possible as they do today.
>>> >>
>>> >> +1
>>> >>
>>> >> > * Identical checks should run only once. The definition of "identical" is: a) they have the same implementation and b) they have the same configuration. Same implementation but different configuration is not identical checks, so they would all still be run. So the scope is smaller than what it might have looked like in my first post, I hope it's clear now.
>>> >>
>>> >> +1
>>> >>
>>> >> > * Two additional points of discussion: diagnostics and profiling. IMO implementing this change would be easier if we just skipped diagnostics and profiling for the not-run "identical" checks. We have a list of all checks, then remove the identical ones and run the updated list of checks. Keeping diagnostics and profiling info would probably complicate the design of the tool, but maybe someone has clever ideas. From a user perspective, I think it's neat to avoid diagnostics from identical checks, since it's less noisy to NOLINT them in code.
>>> >>
>>> >> Regarding diagnostics, what I'm hoping we can do is keep the
>>> >> diagnostic behavior we already have. e.g.,
>>> >> https://godbolt.org/z/ss4eKsadb where the diagnostic says:
>>> >>
>>> >> warning: found assert() that could be replaced by static_assert()
>>> >> [cert-dcl03-c,misc-static-assert]
>>> >>
>>> >> and lists both of the ways in which the diagnostic can be triggered
>>> >> but does not show hicpp-static-assert which is a third way to trigger
>>> >> the diagnostic (but was never enabled). I think this is useful because
>>> >> not listing the identical checks can potentially cause some pain for
>>> >> users depending on how they fix the diagnostic or migrate their code.
>>> >> While this one would be most sensible to just change the code, it's
>>> >> not unthinkable that a developer will add a `// NOLINT` comment to
>>> >> disable the diagnostic. If they're only shown one check name in the
>>> >> diagnostic, the user may elect to write `// NOLINT(cert-dcl03-c)` to
>>> >> disable *only* the check name they see. They'll be rather surprised if
>>> >> they decide to drop checking the CERT module and they start getting
>>> >> diagnostics on that line about misc-static-assert when the code was
>>> >> previously suppressing the warning. Seeing multiple checks that
>>> >> trigger will hopefully help the user to know to write `// NOLINT`
>>> >> instead of a specific check, or may help gently guide them towards
>>> >> fixing their code instead of suppressing the diagnostic.
>>> >>
>>> >> That said, if this turns out to be an implementation burden in
>>> >> practice, I think the behavior is fine *so long as* the user who adds
>>> >> `// NOLINT(cert-dcl03-c)` doesn't have the experience of silencing the
>>> >> cert diagnostic while still emitting the misc diagnostic (which, if
>>> >> we're not running the duplicate check in the first place, shouldn't
>>> >> happen anyway).
>>> >>
>>> >> > Would you agree on the above?
>>> >>
>>> >> I think we're in agreement, yes!
>>> >>
>>> >> ~Aaron
>>> >>
>>> >> > > So the primary check gets all the options that users can tweak, and then coding style checks set those tweakable options as they see fit
>>> >> > Thanks, this helps me understand the design. Makes sense to have a "core" implementation of the check with knobs that can be tweaked via config for each guideline. I have some thoughts about build system and where to put such "common" checks that are re-usable, but let's leave that for another time :)
>>> >> >
>>> >> > On Mon, Oct 4, 2021 at 4:35 PM Aaron Ballman <aaron at aaronballman.com> wrote:
>>> >> >>
>>> >> >> On Sun, Oct 3, 2021 at 8:48 PM Carlos Galvez <carlosgalvezp at gmail.com> wrote:
>>> >> >> >
>>> >> >> > > Making it easier
>>> >> >> > for users to enable all of the checks at once requires check authors
>>> >> >> > to consider *all* interactions between modules, and to date that isn't
>>> >> >> > a price we've been willing to make check authors pay. Instead, we
>>> >> >> > typically view modules as being orthogonal sets of functionality that
>>> >> >> > may or may not play well together.
>>> >> >> >
>>> >> >> > Fully agree to that, checks should absolutely be independent, granular and focused on doing one single thing without depending on one another. And they may conflict with each other. My point is that it should be the user making the decision as to how to handle conflicts, enabling and disabling whichever checks they want. Why is that a problem?
>>> >> >>
>>> >> >> I am not convinced that it's the right granularity. We've developed
>>> >> >> aliases with the idea that as far as the user is concerned, which is
>>> >> >> primary and which is aliased should not matter. Now we want to change
>>> >> >> that with a blanket switch that impacts all aliases. This isn't a
>>> >> >> problem per se, but I'm not convinced we're ready to add a switch to
>>> >> >> disable all aliases yet.
>>> >> >>
>>> >> >> > We are not requiring extra work on LLVM developers.
>>> >> >>
>>> >> >> We certainly are requiring extra work for check developers because now
>>> >> >> they have to consider whether to make a check an alias or not because
>>> >> >> that now matters to how their check is surfaced to users.
>>> >> >>
>>> >> >> > Why should they have a say on how the user configures the tool? If developers have exposed this public CLI/config interface, users are free to use it as they see fit (and pay the price of their choices, of course).
>>> >> >> >
>>> >> >> > > You're proposing to make aliases be sort of second-class checks
>>> >> >> >
>>> >> >> > Hmm, not exactly, I agree with you that all publicly documented checks should be regarded as "first-class" checks. What I'm proposing here could be considered more as an "optimization" - don't run the same check twice, just re-use previous results. If you know that you are going to run 2 checks that are identical, just run one of them.
>>> >> >>
>>> >> >> I think that functionality is a good first step. If it's limited to
>>> >> >> only checks that are *identical*, then my concerns largely go away.
>>> >> >> But I had the impression that the desired scope was larger -- to avoid
>>> >> >> running checks that are identical when ignoring the default
>>> >> >> configurations of the checks -- and that difference is the bit that
>>> >> >> worries me.
>>> >> >>
>>> >> >> > > we've never really cared which module has the primary check and which one has the alias check
>>> >> >> > I actually think this is one of the problems that you were concerned about above. Today, not all checks are first-class. There are "primary" checks and "alias" checks, and a decision has to be made as to where to put which, which causes the very issues you are describing. Not all checks from cppcoreguidelines are "primary", some come from "misc", so what? They aren't any less valuable.
>>> >> >>
>>> >> >> Today, all checks are first-class as far as the user running the tool
>>> >> >> is concerned. We mention aliases in public so users are aware that
>>> >> >> they exist and to ease documentation burdens (maybe that was a
>>> >> >> mistake), but users don't have to care which one is primary and which
>>> >> >> ones are secondary when they run the tool. So long as we keep that
>>> >> >> property, I'm content.
>>> >> >>
>>> >> >> > I refer to my question above - why do we even need this distinction between "primary" and "alias", i.e. what problem is this solving? There's 3 checks checking for C style arrays? That's great, why not just link them to the same documentation, and internally (not leaked to the user) use the same implementation?
>>> >> >>
>>> >> >> That's the whole goal of aliases. You bring up a good point on whether
>>> >> >> we need users to know about them at all.
>>> >> >>
>>> >> >> > The only use case I can think of to publicly document that there are "alias" checks is to let the user know that they can "skip redundant checks" by manually disabling them. If there's any other use case I'd be very happy to know!
>>> >> >>
>>> >> >> They're a convenience for developing a check once but varying its
>>> >> >> behavior across modules, as this use case comes up somewhat frequently
>>> >> >> with coding standards. So the primary check gets all the options that
>>> >> >> users can tweak, and then coding style checks set those tweakable
>>> >> >> options as they see fit. However, this possibly doesn't require the
>>> >> >> user to know about the check being an alias. And there's no benefit to
>>> >> >> running a check twice when it's configuration is identical to a
>>> >> >> previous check that I can tell.
>>> >> >>
>>> >> >> ~Aaron
>>> >> >>
>>> >> >> >
>>> >> >> > /Carlos
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > On Sun, Oct 3, 2021 at 6:19 PM Aaron Ballman <aaron at aaronballman.com> wrote:
>>> >> >> >>
>>> >> >> >> On Sun, Oct 3, 2021 at 11:24 AM Aaron Ballman <aaron at aaronballman.com> wrote:
>>> >> >> >> > To date, aliases have been surfaced to the user as a first-class check
>>> >> >> >> > (while the documentation typically redirects to the primary check for
>>> >> >> >> > simplicity of writing check aliases, there's no requirement that they
>>> >> >> >> > do so; other than that, aliases should behave the same as any other
>>> >> >> >> > check as far as the user is concerned.) You're proposing to make
>>> >> >> >> > aliases be sort of second-class checks that are more closely tied to
>>> >> >> >> > the primary check by providing an option that says aliases are
>>> >> >> >> > disabled as a blanket operation rather than a case-by-case basis. I am
>>> >> >> >> > not convinced this is a good approach. We may already have checks for
>>> >> >> >> > which this is a problem (where the aliased check and the primary check
>>> >> >> >> > are checking different things), and this closes off the design space
>>> >> >> >> > for such checks in the future (and possibly encourages less code reuse
>>> >> >> >> > in the process).
>>> >> >> >> >
>>> >> >> >> > Btw, I'm not saying "no way", I'm saying "let's make sure we're not
>>> >> >> >> > regressing functionality or painting ourselves into a corner."
>>> >> >> >>
>>> >> >> >> Also, to date I believe we've never really cared which module has the
>>> >> >> >> primary check and which one has the alias check (it's fine for the
>>> >> >> >> primary to be in cert- and the alias to be in bugprone- or vice
>>> >> >> >> versa), but with this sort of an option, that implementation detail is
>>> >> >> >> leaked out to the user in a more obvious way. If we go this route, we
>>> >> >> >> might have to move some checks around so there's some consistent rule
>>> >> >> >> to aid users (like, the general-purpose module holds the primary and
>>> >> >> >> the coding style guidelines get the alias), but it may not work in
>>> >> >> >> every situation (I can imagine a check living in two style guides but
>>> >> >> >> not a general purpose module, or two general purpose modules but not a
>>> >> >> >> style guide).
>>> >> >> >>
>>> >> >> >> ~Aaron


More information about the cfe-dev mailing list