[cfe-dev] More plans on Static Analyzer + Clang-Tidy interoperation.

Wed Oct 14 13:28:45 PDT 2020

On 10/13/20 5:47 AM, Aaron Ballman wrote:
> On Mon, Oct 12, 2020 at 5:26 PM Artem Dergachev via cfe-dev
> <cfe-dev at lists.llvm.org> wrote:
>> After a bit of hiatus i'm reviving this work. The previous discussion is
>> at http://lists.llvm.org/pipermail/cfe-dev/2019-August/063092.html, also
>> http://lists.llvm.org/pipermail/cfe-dev/2019-September/063229.html. The
>> plan is not to turn the two Clang bug-finding tools into a single tool
>> entirely but rather to make them more interchangeable, so that users who
>> have existing integration of, say, static analyzer, could also have
>> clang-tidy integration for as-free-as-possible. In particular,
>> checks/checkers could be shared, which would resolve the constant
>> struggle of "where to put the checker?".
> Thank you for looking into this topic again!
>
>> One thing i realized since the last discussion is that the more tools we
>> integrate, the more redundant compilation work do we perform. If we are
>> to compile the code + run static analyzer + clang-tidy separately over
>> it, that's 3 rebuilds of the AST from scratch. Whatever solution we
>> provide to run both tools, I'd much rather keep it at 2 (compilation +
>> *all* analysis) because static analysis is already expensive in terms of
>> build-time, no need to make it worse.
> +1
>
>> One core component of this plan is to teach clang-tidy how to emit
>> reports in various human-readable and machine-readable formats that the
>> static analyzer already supports. At this point i'm pretty much ready to
>> publish a clang::DiagnosticConsumer that'd produce all kinds of static
>> analyzer-style reports. In particular, such consumer would enable
>> clang-tidy to be used from inside scan-build instead of clang --analyze;
>> both clang-tidy checks and static analyzer checkers would be ran from
>> inside clang-tidy binary but produce html reports consumable by
>> scan-build; a common index.html report would be generated for all
>> checkers. I'm very much in favor of teaching scan-build how to run
>> arbitrary clang tools, not just clang-tidy (i.e., "scan-build
>> --tool=/path/to/my-clang-tool" or something like that) which would allow
>> users who don't have CMake compilation databases to take advantage of
>> clang tools more easily (and we have a few users who are interested in
>> that).
> I think this sounds really useful.
>
>> On the other hand, we've also made up our mind that for ourselves we do
>> in fact want produce a clang binary with clang-tidy checkers integrated.
> I continue to disagree with that stance, FWIW. I want to see clang
> able to run clang-tidy checks the same as clang static analyzer checks
> because having a single tool to run which produces diagnostic results
> is easier than running multiple tools in build systems or CI
> pipelines. I realize that the community may not share this position,
> but I hope we're willing to revisit the discussion.

Just to double-check, sounds like we two actually *agree* on this(?) 
like, we both want clang with clang-tidy checks?

>> Apart from free integration into a number of our CI systems, that'll
>> allow us to avoid shipping and supporting the clang-tidy command line
>> interface as a separate feature. That's about 7MB increase in clang
>> binary size which we're fine with. I plan to make it an off-by-default
>> cmake flag unless there are strong opinions to do the opposite.
> I don't have a strong opinion to do the opposite, but I'd like to at
> least explore it as an option.
>
>> The
>> alternative approach to move ourselves into a separate binary that's
>> integrated at the Driver level would also technically work but that's
>> too disruptive to commit to for us at the moment - even just the Driver
>> work alone would require a lot of testing, let alone making sure that
>> static analyzer works exactly as before from within the tool (it already
>> works from inside clang-tidy but we don't really know if it actually
>> works *the same* in all the potential cornercases).
> I think this approach would add more complexity to the question "what
> tool do I reach for".
>
>> So, like, we want to support multiple workflows.
>>
>> Also i'll be making occasional commits to some clang-tidy checks that
>> we're interested in - mostly to address a few false positives (say, some
>> checkers aren't aware of some Objective-C constructs), but also
>> sometimes tweak the warning text a little bit (for example,
>> bugprone-assert-side-effect is an awesome check but the warning text
>> "found assert() with side effect" sounds fairly un-compiler-ish - we're
>> not playing hide-and-seek!). Hope i'm welcome with these changes ^.^
> Most certainly! One other thing that I think we should think about is:
> now that we're looking to more tightly couple the static analyzer and
> clang-tidy, should we unify their diagnostic message styles?
> clang-tidy follows the Clang conventions and does not use
> grammatically correct diagnostic messages (no capitalization or
> punctuation) while the static analyzer has its own convention
> (capitalization but no punctuation).

This is an important thing to do for any UI that displays both static 
analyzer warnings and clang-tidy warnings; different capitalization is 
indeed a bummer.

It's not uncommon for UIs/IDEs to re-capitalize all warnings into a 
single style. If we suddenly change static analyzer warnings to be 
lowercase now, i think most UIs will simply re-capitalize them back, so 
that to preserve the existing experience.

I could also teach my intermediate DiagnosticConsumer to do the same.

Or i could teach individual PathDiagnosticConsumers (the ones that are 
responsible for static analyzer-style outputs) to do that for themselves 
specifically - that actually sounds like a very good plan, probably 
*the* way forward.

> Relatedly, do you expect to
> expose some of the handy diagnostic utilities from the static analyzer
> to clang-tidy checks, such as the ability to report a path that led to
> a diagnostic (for the few checks that track that sort of information)?

These facilities will be exposed once https://reviews.llvm.org/D67422 
actually lands.

That said, i don't think path notes specifically will be useful for any 
of the existing clang-tidy checks. IIUC the only checker so far that 
could potentially make use of these notes is the use-after-move checker 
- but it doesn't actually find the path, it only finds two points on the 
CFG with a happens-before relation. Static analyzer path notes make 
sense because every single point in the path potentially contributes to 
the warning. Which implies that behind the warning there's a *huge* 
facility that understands potential implications of *every* statement. 
When it's just two points that contribute to the path, two notes are enough.

One thing i'm excited about, though, is eventually producing a good bug 
visualization for flow-sensitive warnings. As i mentioned a few times, 
flow-sensitive warnings should be displayed together rather than 
separately in order to be understandable: "This code is dead *because* 
this condition is always false *because* this assignment to the variable 
that participates in the condition is a dead store". I believe static 
analyzer's rich diagnostics system could contribute to that dream.

> Thanks!
>
> ~Aaron
>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev