[cfe-dev] [analyzer][RFC] Attribute(s) to enhance/configure the analysis

Fri Oct 23 16:49:40 PDT 2020

> On Oct 23, 2020, at 5:34 AM, Aaron Ballman <aaron at aaronballman.com> wrote:
> 
> On Thu, Oct 22, 2020 at 10:31 PM Jan Korous <jkorous at apple.com <mailto:jkorous at apple.com>> wrote:
>> 
>> Hi all,
>> 
>> I'm sorry I'm a little late to the party!
>> 
>> I have couple thoughts first and a proposal at the end.
>> 
>> On Oct 20, 2020, at 1:06 PM, Gábor Horváth via cfe-dev <cfe-dev at lists.llvm.org> wrote:
>> 
>> I think an ideal suppress annotation would:
>> * be applicable to a wide range of scopes, not just function scope
>> * have clear documentation indicating this is the last resort and refer to some guidelines how to suppress false positives by improving the code
>> * come with an easy mechanism to check whether the annotation makes any difference, so we can easily get rid of redundant ones
>> * come with an easy mechanism to check how many issues are suppressed by the same annotation (or even expect the user to specify this number as an argument and warn if that does not match the reality?)
>> 
>> 
>> We must consider projects that need to support multiple versions of clang and by proxy multiple versions of Static Analyzer. Since there can be both false positives and false negatives in Analyzer we need to have a good story for situations when an attribute doesn't make sense for a particular Analyzer version (for example older one) but the maintainers still want to keep it in source. Maybe we could produce just lower severity diagnostics than warning?
>> (In case someone asks - I would strongly prefer to not introduce versioning of these attributes.)
>> 
>> On Oct 20, 2020, at 11:24 AM, Valeriy Savchenko via cfe-dev <cfe-dev at lists.llvm.org> wrote:
>> However, the hardest choice here is the string argument for the attribute.  Again, it would be perfect if we can avoid weird identifiers like “java.com.warnings.alpha.whatever.2020”, it would be good if that string will make it clear for people who don’t use the analyzer what’s going on here.
>> Checker names that we use in command line are a bit too verbose, error messages are prone to changes, bug categories are too broad and abstract (the intention is pretty vague if you suppress “Logic error”).
>> 
>> My suggestion would be using so-called “Bug types” (we already show them in the summary table in index.html).  Each diagnostic has one, it’s unique and more-or-less descriptive.  If we revise all the existing bug types and try to make them concise and clear, I think it can work!
>> 
>> 
>> Just to make sure we're on the same page - I'll use one of the WebKit checkers that I need to have a suppression for as an example:
>> 
>> checker name:
>> webkit.UncountedLambdaCapturesChecker
>> error messages (depend on the context):
>> warning: Captured raw-pointer 'ref_countable' to uncounted type is unsafe [webkit.UncountedLambdaCapturesChecker]
>> warning: Implicitly captured raw-pointer 'ref_countable' to uncounted type is unsafe [webkit.UncountedLambdaCapturesChecker]
>> bug category:
>> WebKit coding guidelines
>> bug type:
>> Uncounted call argument for a raw pointer/reference parameter
>> 
>> Assuming I got it right from all the above I have a clear preference which is the checker name. In a way it would be analogous to pragmas for cc1 diagnostics suppression - the same identifier is used in CLI and in the source for suppression.
>> My second choice would be to introduce another string identifier as I have a hard time imagining any other of the existing strings being introduce to codebases.
>> 
>> On Oct 20, 2020, at 1:11 PM, Valeriy Savchenko via cfe-dev <cfe-dev at lists.llvm.org> wrote:
>> 
>> I just want to add something that I think was not very clear from my previous email.  The existing “suppress” attribute is a statement attribute, which gives it a very fine level of granularity.
>> 
>> 
>> I have couple use-cases for declaration attributes (like ParmVarDecl, FiledDecl, VarDecl) but I assume that no matter what mechanism we devise it should be applicable for declaration attributes too.
>> One thing I feel we need is "kind safety". While it's trivial for single-purpose attribute to define what kinds of AST nodes is it applicable to it becomes a bit more work with different arguments of a generic attribute.
>> 
>> On Oct 20, 2020, at 1:12 PM, Gábor Horváth via cfe-dev <cfe-dev at lists.llvm.org> wrote:
>> 
>> Sorry for the duplicate emails, I just realized that there is one more dimension of suppression mechanisms. There are separate tools such as CodeChecker that already support suppressing individual reports (either using comments or bug hashes). So adding an annotation could make it unclear who is responsible for the suppression and which feature should the user use.
>> 
>> 
>> I assume that the status quo is that there's no common standard for in-source suppression for bug-finding tools. That means that arguably we won't make it significantly worse by adding n+1.
>> Hypothetically if we design the annotation system well we might convince other projects to use it. And if I had to pick one tool to take into the account I would maybe think of GCC analyzer?
>> 
>> https://developers.redhat.com/blog/2020/03/26/static-analysis-in-gcc-10/
>> 
>> One thing to consider here is that if we want to have some cross-tool story we probably shouldn't use any vendor-specific prefix like clang::suppress or csa::suppress.
> 
> There's not a standardized identifier for this attribute and so I
> believe we should in fact use a vendor-specific scope. Putting
> something in the standard's attribute namespace would not be the right
> approach, at least. My preference is to use the clang vendor namespace
> because it's already a vendor namespace our users are used to seeing
> and because the name covers the broadest range of diagnostics that can
> be disabled.

I see, thanks for explaining this - I'm actually not familiar with constraints imposed by the C++ standard. In general the clang:: prefix should work for us.
The only concern I have is - we need to make sure we support also C. Is there a notion of namespace in __attribute__ (())?
(Grepping clang/test for __attribute__ didn't make me much wiser.)

> 
>> On Oct 21, 2020, at 11:57 AM, Aaron Ballman via cfe-dev <cfe-dev at lists.llvm.org> wrote:
>> 
>> Thank you for exploring the options in this space -- I think having a
>> uniform way for users to suppress diagnostics using attributes is a
>> really interesting idea. I have personally never really liked using
>> pragmas to do this work because that approach is usually quite verbose
>> when applied to a single line of code.
>> 
>> IIUC we have use-cases for suppression of "diagnostics produced by Analyzer" only.
>> While attribute-based suppression of cc1 diagnostics sounds interesting I'm worried that the necessary broad discussion would effectively halt this.
>> Would you mind if we keep the focus on Analyzer for now or do you believe we'd not converge to the right design unless we consider cc1 too?
> 
> I don't think that's the case, though. We have at least three
> different tools that are all tightly integrated together and none of
> them share a common mechanism for suppressing diagnostics. In Clang,
> you either have to use command line arguments or a pragma. In
> clang-tidy, we support // NOLINT comments. Now we're talking about
> adding an attribute for the static analyzer. I think it's fine for
> there to be more than one way to suppress a diagnostic, but we need a
> common way for users that doesn't require them to understand what part
> of the toolchain issued a diagnostic in order to suppress it properly.

I hear you although I don't agree that users shouldn't need to know what tool produces a warning. There are multiple different tools in the toolchain and there's no unified abstraction over their diagnostics.
Clang-tidy and Static Analyzer are opt-in - users need to explicitly use them (at least by proxy of an IDE or some other tool).

BTW are we reasonably sure that there won't be any weird corner-cases for attribute-based diagnostics suppression in Frontend? There are couple hundred attribute-related warnings just in DiagnosticSemaKinds.td.

> I'm fine focusing on the static analyzer's needs, but I personally
> think it's a requirement that whatever we come up with is a general
> solution that covers at least the frontend, clang-tidy, and the static
> analyzer with buy-in from the people heavily involved in those tools,
> even if there's not a commitment to get that work done immediately for
> all three tools.

The situation I specifically want to avoid is to be held back by Frontend. I am not aware of any motivation for attribute-based warning suppression coming from that side of things and don't think that Analyzer need is the right justification for that.
That being said - ultimately we are more or less talking about an attribute with a textual identifier. I consider such specification broad enough to support pretty much any possible use-case. The only thing we might want to do is to keep the specification as trivial as possible. I don't think we need to come up with a plan for Frontend as long as we do our best to avoid potential blockers in the design.

>> Re: "stable identifiers". I feel that no matter what identifiers we decide to use we still might want some escape hatch in case we end up having to rename a "stable identifier".
>> I am not aware of any solution for such situation with pragmas.
>> https://clang.llvm.org/docs/UsersManual.html#controlling-diagnostics-via-pragmas <https://clang.llvm.org/docs/UsersManual.html#controlling-diagnostics-via-pragmas>
> 
> I'm not overly worried about the stability of identifiers -- we change
> diagnostics on occasion, but I think we tweak wording or remove/add
> diagnostics far more than we change a diagnostic's meaning
> sufficiently to warrant giving it a new stable identifier.
> 
> That said, it does bring up a different kind of diagnostic we should
> keep in mind: [[clang::suppress("I no longer exist")]] where the
> string literal doesn't refer to a diagnostic name. e.g., the user was
> suppressing a diagnostic we removed, or the user has a typo in the
> string literal, etc. We should perhaps diagnose this case with typo
> correction to help alert users of the confusion rather than silently
> accepting an attribute that has no impact.

I think that while this is important it is orthogonal to the discussion of how to name the attribute and what kind of identifier to use and we can deal with this separately later.

Note that this would still be a problem for projects that need to support multiple toolchains & versions - e. g. LLVM still supports clang-3.5:
https://llvm.org/docs/GettingStarted.html#host-c-toolchain-both-compiler-and-standard-library

If we ever remove an identifier then I don't see what a maintainer of such project should do:
- When they build and analyze the project with older toolchain they support the identifier would work as expected.
- When they build with a newer toolchain they support they get a warning which is not actionable for them - assuming we don't want to encourage a suppression suppression mechanism...

Maybe the bottom line is that we simply shouldn't ever remove an identifier?

> 
>> But I am thinking that basically all we need is a notion of identifier alias - if we can have two names for an attribute we're fine.
>> Imagine we add a warning identifier "foo" to Analyzer in clang-12 and add an alias "bar" for it in clang-13. Projects using the "bar" identifier are fine and any project that is ok with minimal supported version being clang-13 can change the spelling in their sources to "bar".
> 
> I think this is an interesting idea and is worth exploring. For
> instance, in clang-tidy we have checks that are aliased into multiple
> groupings and I could imagine the user saying they want to suppress a
> readability-based diagnostic but still allow the diagnostic if another
> module is turned on even if it's the same information (e.g., maybe
> they enable the CERT module and suddenly the issue is no longer about
> readability but security as far as the user is concerned).
> 
>> Overall I suggest we:
>> - Adopt the suppress attribute Valeriy mentioned and extend it to declarations.
> 
> Agreed, though we should also consider whether we have any type-based
> warnings where we want the attribute on a type (I can't think of any
> off the top of my head)

Agreed. I feel that we can just make sure this works for attributes on a type but don't have to necessarily implement this until there's a use-case.

> 
>> - Don't use prefix like clang:: or csa:: to make it an obvious interface for suppression to be used by other tools too.
> 
> -1, this steps on design space that's not ours to step on.

I put my concern about C above. If that's ok I'm fine with clang:: prefix.

> - MVP can support just a single string identifier, later extend to comma-separated list of string identifiers.
> 
> I can't imagine supporting a list of identifiers is actually more work
> than supporting a single identifier. That said, I don't think it's
> critical for the MVP, but if someone wants to implement it in the
> initial offerings, I say go for it.

Agreed.

> - Use checker name as an identifier.
> 
> This approach seems to work fine if the checker only issues a few
> diagnostics and they generally aren't on the same statement or
> declaration, but it would be a problem if we had a checker which
> issues multiple unrelated (or semi-related) warnings on the same
> statement. e.g.,
> 
> // note, this is all one statement with newlines for clarity
> foo() + // say Checker A issues a "don't call foo" warning here
> bar() / baz(); // and Checker A also issues a "potential divide by
> zero" warning here

You're right that having a per-checker suppression would either constitute a constraint on implementation details of the checkers or force users to work around it by refactoring their code.

Also, there's a problem with using checker name as the identifier - since we have alpha checkers it's somewhat expected that such checker will at some point get renamed:
alpha.PackageFoo.CheckerBar -> PackageFoo.CheckerBar

The above makes me lean towards the direction that we should devise a new identifier then.

> If the user wanted to use an attribute to suppress the potential
> divide by zero warning, using the checker name would also mean the
> "don't call foo" warning is silenced because the attribute is at the
> statement level. My impression is that we don't have many checkers
> that would be in this situation, so maybe it's fine, but I do wonder a
> bit about the frontend diagnostics. I'd imagine we could use the
> warning group name as the "checker" name in that situation, but I
> could imagine code like this:
> 
> [[gnu::not_supported_in_clang]] // want to suppress the ignored
> unknown attribute diagnostic here
> [[clang::supproted_in_clang]] // don't want to suppress the ignored
> unknown attribute diagnostic here (from the typo)
> void func();

Just to make sure we're on the same page - Analyzer warnings aren't defined in a tablegen file like Frontend warnings are and they also don't have associated -Wfoo/-Wno-foo command line flags.
The warning text is produced ad-hoc in each checker's implementation. The identifier that's printed as part of the warning is the checker name.

I basically don't see a connection between checker name and Frontend warning names which makes me think that it doesn't really matter what identifier we decide to use for suppression in Analyzer as that's going to be an Analyzer-only concept anyway and Frontend is welcome to use any suitable identifier.

> I think it's worth intentionally deciding whether we want to support
> cases like this or not up front. We also probably should talk about
> some edge cases like whether aliases are treated the same as the
> original checker or not, does it support families of checkers, etc.

I would strongly prefer to keep the design as simple as possible (at least initially).

Cheers,

Jan

> ~Aaron
> 
>> 
>> 
>> Cheers,
>> 
>> Jan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20201023/4071fdea/attachment-0001.html>