[cfe-dev] [clang-tidy][RFC] Run each check only once

Carlos Galvez via cfe-dev cfe-dev at lists.llvm.org
Thu Dec 16 05:28:58 PST 2021


Aaron,

Did you have a chance to look at the patch? I have tried to reach you via
the comments section but haven't got any reply yet.

Best regards,
Carlos

On Sat, Nov 20, 2021 at 3:07 PM Carlos Galvez <carlosgalvezp at gmail.com>
wrote:

> Here it goes! https://reviews.llvm.org/D114317
>
> As I wrote in the commit message, please take it with a grain of salt -
> the code is not pretty and many things are missing - I'll take care of that
> if/when we are happy with the general direction. Looking forward to your
> feedback! If you have suggestions for the missing bits I'd be very happy to
> hear them.
>
> /Carlos
>
> On Thu, Nov 18, 2021 at 1:46 PM Aaron Ballman <aaron at aaronballman.com>
> wrote:
>
>> On Wed, Nov 17, 2021 at 7:36 AM Carlos Galvez <carlosgalvezp at gmail.com>
>> wrote:
>> >
>> > Hi again!
>> >
>> > I've been looking into this for a while now and have some very crude
>> proof-of-concept that seems to work without any impact for check developers
>> and ensuring only "perfect aliases" are not run. The only things I can't
>> get around are diagnostics and profiling, so I'd need some help to figure
>> that out (or drop the requirements). Should I push this proof-of-concept
>> for review and continue the discussion there, or do you prefer to discuss
>> here in the mailing list?
>>
>> If you've got a proof of concept, I think it's probably worth showing
>> at this point. (Just as an FYI for setting expectations, my review
>> time is pretty limited for the next while.)
>>
>> ~Aaron
>>
>> >
>> > Best regards,
>> > /Carlos
>> >
>> >
>> > On Mon, Oct 4, 2021 at 9:03 PM Aaron Ballman <aaron at aaronballman.com>
>> wrote:
>> >>
>> >> On Mon, Oct 4, 2021 at 2:30 PM Carlos Galvez <carlosgalvezp at gmail.com>
>> wrote:
>> >> >
>> >> > I think we agree in the goals of this change, perhaps it gets
>> confusing when using the term "alias" with different meanings, so let me
>> summarize what I believe is the requirements of what we want to have,
>> without entering in technicalities:
>> >> >
>> >> > * All publicly documented checks should be "first-class checks",
>> meaning users can be enable/disable them independently of each other. Not
>> the scope of this change to deprecate the concept/name of "alias", perhaps
>> in the future.
>> >>
>> >> +1. To be super clear, I consider checks which redirect their
>> >> documentation to still count as "publicly documented checks".
>> >>
>> >> > * Users don't need to know/care how these checks are implemented. If
>> they reuse code or not, that's not their concern. They just need to care
>> that their code will get checked properly when the check is enabled, so
>> that they can fix bugs, improve code quality or be compliant with a given
>> guideline.
>> >>
>> >> +1
>> >>
>> >> > * Check developers should keep their exact same development flow,
>> and be able to reuse code if possible as they do today.
>> >>
>> >> +1
>> >>
>> >> > * Identical checks should run only once. The definition of
>> "identical" is: a) they have the same implementation and b) they have the
>> same configuration. Same implementation but different configuration is not
>> identical checks, so they would all still be run. So the scope is smaller
>> than what it might have looked like in my first post, I hope it's clear now.
>> >>
>> >> +1
>> >>
>> >> > * Two additional points of discussion: diagnostics and profiling.
>> IMO implementing this change would be easier if we just skipped diagnostics
>> and profiling for the not-run "identical" checks. We have a list of all
>> checks, then remove the identical ones and run the updated list of checks.
>> Keeping diagnostics and profiling info would probably complicate the design
>> of the tool, but maybe someone has clever ideas. From a user perspective, I
>> think it's neat to avoid diagnostics from identical checks, since it's less
>> noisy to NOLINT them in code.
>> >>
>> >> Regarding diagnostics, what I'm hoping we can do is keep the
>> >> diagnostic behavior we already have. e.g.,
>> >> https://godbolt.org/z/ss4eKsadb where the diagnostic says:
>> >>
>> >> warning: found assert() that could be replaced by static_assert()
>> >> [cert-dcl03-c,misc-static-assert]
>> >>
>> >> and lists both of the ways in which the diagnostic can be triggered
>> >> but does not show hicpp-static-assert which is a third way to trigger
>> >> the diagnostic (but was never enabled). I think this is useful because
>> >> not listing the identical checks can potentially cause some pain for
>> >> users depending on how they fix the diagnostic or migrate their code.
>> >> While this one would be most sensible to just change the code, it's
>> >> not unthinkable that a developer will add a `// NOLINT` comment to
>> >> disable the diagnostic. If they're only shown one check name in the
>> >> diagnostic, the user may elect to write `// NOLINT(cert-dcl03-c)` to
>> >> disable *only* the check name they see. They'll be rather surprised if
>> >> they decide to drop checking the CERT module and they start getting
>> >> diagnostics on that line about misc-static-assert when the code was
>> >> previously suppressing the warning. Seeing multiple checks that
>> >> trigger will hopefully help the user to know to write `// NOLINT`
>> >> instead of a specific check, or may help gently guide them towards
>> >> fixing their code instead of suppressing the diagnostic.
>> >>
>> >> That said, if this turns out to be an implementation burden in
>> >> practice, I think the behavior is fine *so long as* the user who adds
>> >> `// NOLINT(cert-dcl03-c)` doesn't have the experience of silencing the
>> >> cert diagnostic while still emitting the misc diagnostic (which, if
>> >> we're not running the duplicate check in the first place, shouldn't
>> >> happen anyway).
>> >>
>> >> > Would you agree on the above?
>> >>
>> >> I think we're in agreement, yes!
>> >>
>> >> ~Aaron
>> >>
>> >> > > So the primary check gets all the options that users can tweak,
>> and then coding style checks set those tweakable options as they see fit
>> >> > Thanks, this helps me understand the design. Makes sense to have a
>> "core" implementation of the check with knobs that can be tweaked via
>> config for each guideline. I have some thoughts about build system and
>> where to put such "common" checks that are re-usable, but let's leave that
>> for another time :)
>> >> >
>> >> > On Mon, Oct 4, 2021 at 4:35 PM Aaron Ballman <aaron at aaronballman.com>
>> wrote:
>> >> >>
>> >> >> On Sun, Oct 3, 2021 at 8:48 PM Carlos Galvez <
>> carlosgalvezp at gmail.com> wrote:
>> >> >> >
>> >> >> > > Making it easier
>> >> >> > for users to enable all of the checks at once requires check
>> authors
>> >> >> > to consider *all* interactions between modules, and to date that
>> isn't
>> >> >> > a price we've been willing to make check authors pay. Instead, we
>> >> >> > typically view modules as being orthogonal sets of functionality
>> that
>> >> >> > may or may not play well together.
>> >> >> >
>> >> >> > Fully agree to that, checks should absolutely be independent,
>> granular and focused on doing one single thing without depending on one
>> another. And they may conflict with each other. My point is that it should
>> be the user making the decision as to how to handle conflicts, enabling and
>> disabling whichever checks they want. Why is that a problem?
>> >> >>
>> >> >> I am not convinced that it's the right granularity. We've developed
>> >> >> aliases with the idea that as far as the user is concerned, which is
>> >> >> primary and which is aliased should not matter. Now we want to
>> change
>> >> >> that with a blanket switch that impacts all aliases. This isn't a
>> >> >> problem per se, but I'm not convinced we're ready to add a switch to
>> >> >> disable all aliases yet.
>> >> >>
>> >> >> > We are not requiring extra work on LLVM developers.
>> >> >>
>> >> >> We certainly are requiring extra work for check developers because
>> now
>> >> >> they have to consider whether to make a check an alias or not
>> because
>> >> >> that now matters to how their check is surfaced to users.
>> >> >>
>> >> >> > Why should they have a say on how the user configures the tool?
>> If developers have exposed this public CLI/config interface, users are free
>> to use it as they see fit (and pay the price of their choices, of course).
>> >> >> >
>> >> >> > > You're proposing to make aliases be sort of second-class checks
>> >> >> >
>> >> >> > Hmm, not exactly, I agree with you that all publicly documented
>> checks should be regarded as "first-class" checks. What I'm proposing here
>> could be considered more as an "optimization" - don't run the same check
>> twice, just re-use previous results. If you know that you are going to run
>> 2 checks that are identical, just run one of them.
>> >> >>
>> >> >> I think that functionality is a good first step. If it's limited to
>> >> >> only checks that are *identical*, then my concerns largely go away.
>> >> >> But I had the impression that the desired scope was larger -- to
>> avoid
>> >> >> running checks that are identical when ignoring the default
>> >> >> configurations of the checks -- and that difference is the bit that
>> >> >> worries me.
>> >> >>
>> >> >> > > we've never really cared which module has the primary check and
>> which one has the alias check
>> >> >> > I actually think this is one of the problems that you were
>> concerned about above. Today, not all checks are first-class. There are
>> "primary" checks and "alias" checks, and a decision has to be made as to
>> where to put which, which causes the very issues you are describing. Not
>> all checks from cppcoreguidelines are "primary", some come from "misc", so
>> what? They aren't any less valuable.
>> >> >>
>> >> >> Today, all checks are first-class as far as the user running the
>> tool
>> >> >> is concerned. We mention aliases in public so users are aware that
>> >> >> they exist and to ease documentation burdens (maybe that was a
>> >> >> mistake), but users don't have to care which one is primary and
>> which
>> >> >> ones are secondary when they run the tool. So long as we keep that
>> >> >> property, I'm content.
>> >> >>
>> >> >> > I refer to my question above - why do we even need this
>> distinction between "primary" and "alias", i.e. what problem is this
>> solving? There's 3 checks checking for C style arrays? That's great, why
>> not just link them to the same documentation, and internally (not leaked to
>> the user) use the same implementation?
>> >> >>
>> >> >> That's the whole goal of aliases. You bring up a good point on
>> whether
>> >> >> we need users to know about them at all.
>> >> >>
>> >> >> > The only use case I can think of to publicly document that there
>> are "alias" checks is to let the user know that they can "skip redundant
>> checks" by manually disabling them. If there's any other use case I'd be
>> very happy to know!
>> >> >>
>> >> >> They're a convenience for developing a check once but varying its
>> >> >> behavior across modules, as this use case comes up somewhat
>> frequently
>> >> >> with coding standards. So the primary check gets all the options
>> that
>> >> >> users can tweak, and then coding style checks set those tweakable
>> >> >> options as they see fit. However, this possibly doesn't require the
>> >> >> user to know about the check being an alias. And there's no benefit
>> to
>> >> >> running a check twice when it's configuration is identical to a
>> >> >> previous check that I can tell.
>> >> >>
>> >> >> ~Aaron
>> >> >>
>> >> >> >
>> >> >> > /Carlos
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Sun, Oct 3, 2021 at 6:19 PM Aaron Ballman <
>> aaron at aaronballman.com> wrote:
>> >> >> >>
>> >> >> >> On Sun, Oct 3, 2021 at 11:24 AM Aaron Ballman <
>> aaron at aaronballman.com> wrote:
>> >> >> >> > To date, aliases have been surfaced to the user as a
>> first-class check
>> >> >> >> > (while the documentation typically redirects to the primary
>> check for
>> >> >> >> > simplicity of writing check aliases, there's no requirement
>> that they
>> >> >> >> > do so; other than that, aliases should behave the same as any
>> other
>> >> >> >> > check as far as the user is concerned.) You're proposing to
>> make
>> >> >> >> > aliases be sort of second-class checks that are more closely
>> tied to
>> >> >> >> > the primary check by providing an option that says aliases are
>> >> >> >> > disabled as a blanket operation rather than a case-by-case
>> basis. I am
>> >> >> >> > not convinced this is a good approach. We may already have
>> checks for
>> >> >> >> > which this is a problem (where the aliased check and the
>> primary check
>> >> >> >> > are checking different things), and this closes off the design
>> space
>> >> >> >> > for such checks in the future (and possibly encourages less
>> code reuse
>> >> >> >> > in the process).
>> >> >> >> >
>> >> >> >> > Btw, I'm not saying "no way", I'm saying "let's make sure
>> we're not
>> >> >> >> > regressing functionality or painting ourselves into a corner."
>> >> >> >>
>> >> >> >> Also, to date I believe we've never really cared which module
>> has the
>> >> >> >> primary check and which one has the alias check (it's fine for
>> the
>> >> >> >> primary to be in cert- and the alias to be in bugprone- or vice
>> >> >> >> versa), but with this sort of an option, that implementation
>> detail is
>> >> >> >> leaked out to the user in a more obvious way. If we go this
>> route, we
>> >> >> >> might have to move some checks around so there's some consistent
>> rule
>> >> >> >> to aid users (like, the general-purpose module holds the primary
>> and
>> >> >> >> the coding style guidelines get the alias), but it may not work
>> in
>> >> >> >> every situation (I can imagine a check living in two style
>> guides but
>> >> >> >> not a general purpose module, or two general purpose modules but
>> not a
>> >> >> >> style guide).
>> >> >> >>
>> >> >> >> ~Aaron
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20211216/db22140d/attachment-0001.html>


More information about the cfe-dev mailing list