[cfe-dev] Proposal: Integrate static analysis test suites

Mon Feb 1 12:26:20 PST 2016

> On Jan 30, 2016, at 6:55 AM, Aaron Ballman <aaron at aaronballman.com> wrote:
> 
> On Fri, Jan 29, 2016 at 11:32 PM, Anna Zaks <ganna at apple.com> wrote:
>> 
>> By calling "$clang —analyze” you are not calling the compiler and asking it
>> to work harder. You are calling another tool that is not going to compile
>> for you but rather provide deep static code analysis.
> 
> This is not unlike the way clang-tidy works (which also runs the
> analyzer!), but clang-tidy still shows compiler diagnostics.

"clang —analyze" is the entry point that calls the static analyzer. Having it call all the other tools that produce diagnostics (such as compiler and clang-tidy) is not the right design. On the other hand, having another overarching tool that collects diagnostics from different sources should be the goal.

> 
>> Calling "clang
>> —analyze" could call the compiler behind the scenes and report the compiler
>> warnings in addition to the static analyzer issues. However, when warnings
>> from both tools are merged in a straightforward way on command line, the
>> user experience could be confusing. For example, both tools report some
>> issues such as warning on code like this:
>>  int j = 5/0; // warning: Division by zero
>>                   // warning: division by zero is undefined
>> [-Wdivision-by-zero]
> 
> This is unfortunate, but to me it shows that we have duplication of
> efforts in our tools. We run into the same general issue with
> clang-tidy checks and the compiler, but the goal is to find one home
> for that diagnostic functionality and only enable it there. If we have
> diagnostics that live in both the compiler and the analyzer, we're
> duplicating effort and we should strive to rectify that where
> possible. There's likely to be cases where this is harder (such as
> division by zero) because you want the diagnostic enabled by default
> without requiring the overhead of running path-sensitive checks, but I
> think there are ways we can manage that.
> 

The problem that makes this hard is that the compiler and the analyzer use different technology to produce the warnings. For example, the analyzer will warn in many more cases of a division by zero. Warning in the case of literal zero is not a special case. In order not to warn in this case, we’d need to teach the analyzer about the compiler warnings and how capable they are. Alternative approach would be to have the overarching tool, which congregates the results from different analysis tools, identify duplicate warnings, possibly, with collaboration from all of the analysis tools.

>> Most importantly, end users should never invoke the analyzer by calling
>> “clang —analyze” since “clang —analyze” is an implementation detail of the
>> static analyzer. The only documented user facing clang static analysis tool
>> is scan-build (see http://clang-analyzer.llvm.org). Here are some reasons
>> for that. For one, it is almost impossible to understand why the static
>> analyzer warns without examining the error paths. Second, the analyzer could
>> be extended to perform whole project analysis in the future and "clang
>> —analyze" works with a single TU at a time.
> 
> As a counter-example to requiring examining the code paths, the
> compiler has thread role analysis diagnostics (among others) that are
> also flow-sensitive and it's never been an issue that users must
> examine the error paths, so I'm not certain that's a particularly
> compelling reason to require a separate tool. Even templates and
> macros require a lot of "path" archaeology, and we've found some
> excellent ways to surface that from the compiler.
> 

The static analyzer is very special in this respect. Other tools do not produce path-sensitive warnings. For example, the analyzer might report a bug which only happens when you take the true branch of one if-statement following by the false branch of the second if-statement and only if you go through the loop exactly once. The paths we report are quite long. 

I, personally, cannot tell why the analyzer reports a bug unless I can see the path in most real (non-reduced) cases. Therefore, I would be surprised if we have many/any users who think that the tool provides useful warnings without looking at the paths.

> Whole-program analysis *is* a reasonably compelling reason for a
> separate tool, however I don't think it should drive the design for
> the user interface. For instance, Visual Studio does not require
> execution of a separate tool to enable their static analysis (which I
> believe does whole-program analysis). So, for instance, how do we
> intend for clang-cl to support the /analyze option? Since we don't
> have whole-program analysis currently, it seems like such a feature
> could be designed to operate from a compiler flag (for instance, in
> conjunction with compilation databases) that is responsible for
> spawning off that secondary tool when required. (Note, I think a
> similar approach could be used to support running clang-tidy from the
> compiler via a command line flag.)
> 

Because of the nature of the reports, providing a great user experience for reporting bugs on command line will be challenging. We do have text output that is only used and supported for testing. One issue might result in dozens of path notes accosted with it… I think the users will get a much better experience if the results are viewed through UI. (User experience is very important. We do not want people running the tool not understanding the results. Thinking they are all false positives and deciding not to use the tool because of that.)

>> I agree that the best user experience is to report all warnings in one
>> place, while still differentiating which warning was reported by which tool.
>> It would be awesome if the results from all bug finding tools such as the
>> clang static analyzer, the compiler, and clang-tidy would be reported
>> through the same interface.
> 
> I think we are in agreement, but to verify what I think we're agreeing
> on: users don't particularly care about the *tool* used nearly so much
> as they care about getting the diagnostics themselves. (For instance,
> users don't care if it's a parser error, a semantic error, a
> path-sensitive error, etc.) When it comes to diagnostics, the easier
> we can make it on the user to enable the functionality, the greater
> the chance of users actually using it. Based on that, having a single
> mechanism the user can invoke to give them diagnostics (such as the
> clang driver itself) is something we should strive towards, even if
> that means executing different libraries or executables under the hood
> (like we do with cc1). Obviously, *reporting* all the diagnostics in a
> single place falls naturally out of invocation of a single tool. Does
> that agree with what you were saying, or am I misinterpreting?
> 

I agree that unifying code analysis in a single tool is very important! I do not think that we can achieve a reasonably good user experience through a command-line only solution for the clang static analyzer.

>> The CodeChecker team is working on a solution for that and I hope we can
>> incorporate their technology in LLVM/clang.
> 
> That's fantastic! Thank you for the explanations, as well as all the

> hard work on this tool.
> 
> ~Aaron

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160201/78ad9354/attachment.html>