[cfe-dev] Clang Static Analyzer Round Table

Gábor Horváth via cfe-dev cfe-dev at lists.llvm.org
Wed Nov 17 14:26:04 PST 2021


Hi!

There are some notes from the CSA roundtable. The meeting was pretty long
and we had a chance to get a glimpse into GCC's analyzer as well, thanks a
lot to David for sharing all that info! The notes might be a bit spotty,
feel free to add  more info in replies or correct anything that I got wrong.

People introducing themselves
Question from the chat: Is analyzer for LLVM IR? No, for source code.
Discussing best way to invoke the analyzer using command line, scan-build,
codechecker.
Should we adopt -fanalyze from gcc? It is good for compatibility but we
might have different output so we are conflicted. Want to keep this option
open.
David, GCC's analyzer's author is interested in implementing SARIF so we
have a common output format across GCC and Clang.
Discussing what are the best way to get started with CSA development
Short status report on Deep's GSoC project
Can we use the static analyzer to transform source code? Technically, it
can emit fixits, but we would not recommend doing it from the analyzer. For
fixits you need to understand all execution paths, analyzer does not do
that. And it is also often ambiguous on a single path that which step
introduced the problem.
Max mentioned using CSA in education. One of the problems he runs into is
regarding the dependencies between checks and he would like to have a way
to disable some of the sinks.There were some discussions whether it is
possible to declare dependencies among checks when using plugins.
Mohannad shared more context why he is interested in source-to-source
transformation.
Tim asked how CSA compares to other analyzers like CppCheck and the
participants shared some experiences but it is really hard to compare
analyzers objectively.
Anton asked if clang-tidy would be a good place to start working on a tool
that can rewrite #pragma directives into builtin calls. We believe it would
be a better place than CSA.
We discussed why CSA is not the right tool to reason about reachability. We
can use ExprInspectionChecker to see what parts of the code are explored by
the analyzer.
Max mentions students sometimes hitting some false positives when they
implement non-trivial structures like doubly linked lists. The CSA team
suggests reporting all of the false positives.
Max also was interested to learn more about the "mark interesting"
features. Artem explained how it is used to construct additional notes when
building bug reports.
David shared that GCC's analyzer work on the gimple layer can piggy-back on
LTO, but the representation is already somewhat lowered, which can also
introduce problems obscuring what the user wrote.
David color codes the exploded graph nodes in the dumps to make it easier
to identify points of interests.
We also discussed how loops are modeled (heuristics to completely unroll
certain loops, and do something akin to widening in abstract
interpretation). After that we briefly discussed supporting inline assembly
and doing taint analysis.
Are there benchmarks for the static analyzer? We have some docker based
solutions to easily run analyzer over some project, it is in the LLVM repo,
`utils/analyzer/SATest.py docker`.
One can also use csa-testbench from github.
Max brought up the conservative evaluation that invalidates a large part of
the execution state. He advocated for emulating certain system calls more
precisely to avoid excessive invalidations. Artem mentioned there are
already facilities for doing that but there is always some room for
improvements.
David described how GCC's analyzer works on the high level. The transfer
functions can generate both mergeable and unmergeable symbolic values to
control what states can be merged and what state cannot.
Exceptions are a pain point for both analyzers at the moment. Both
analyzers have context sensitive program points for doing inter-procedural
analysis.
We discussed the analyzer's inline defensive check heuristics to suppress
false positives.
David explained how he got into creating a static analyzer in GCC (after
being the CPython maintainer and being fed up with certain reference
counting bugs in badly written CPython extensions and wanted to have an
automated solution to find those errors).
In GCC, the states are mutated in place as opposed to in Clang where most
data structures are immutable (that are used by CSA). Bifurcating state was
not trivial with in-place mutated state.
Using gimple David feels like he side-stepped many problems, e,g, how
constructors/destructors are represented is more uniform with regular
functions calls. But many aggressive optimizations happen early, like some
early inlining. Unfortunately, there is no easy way to get a nice,
completely unoptimized IR in GCC at the moment. There is a mark-and-sweep
GC in GCC that is used in various places. But the analyzer is rarely using
the GC for cleaning up memory, it is mostly relying on RAII.
In CSA many of the objects are reference counted. The symbolic values in
CSA are pass by value.
Symbolic values are internalized in a manager for GCC, so there is only one
instance of every symbolic expression, they are deduplicated.
GCC's analyzer looks into the memory space and if they are different, it
assumes they will not alias.
We discussed how CSA's model is bad at recovering from wrong assumptions
about aliasing.
GCC's analyzer has a way to trigger a break when reaching a certain program
point marked with analyzer_break().
Ideas to deal with aliasing:
* prepass to check for equality checks
* or add new nodes with the equality assumption to the worklist (a new
entry to the function), and do not analyze the branch with the wrong
assumption
GCC's static analyzer also has plugins, and there is an example about
holding the global lock for CPython modules.
There are many attributes that can describe the behavior of certain
functions. It would be nice to coordinate on them so both GCC and Clang can
understand the same set of attributes.
We discussed Microsoft SAL, attributes, contracts to describe function
behavior.
Clang analyzer supports noreturn, nullability, and some other
check-specific attributes.
GCC has some access attributes to describe buffer accesses. Microsoft is
using annotations internally extensively to annotate buffer operations.
Clang analyzer does not really ask for buffer sizes since it is not doing
really deep analysis of loops. It will not attempt to do buffer overflow
detection (by default).
An important attribute to support is the cleanup attribute to avoid false
memory leak warnings.
We described the facilities we have to explain warnings to users, how we
construct bug reports. David suggests looking into Explaining static
analysis with rule graphs paper.
Both GCC and Clang deduplicate similar warnings and try to present the one
with the shortest feasible path. GCC also tracks the object of interest
during path diagnostic creation while walking backwards.
It would be great to have a dynamic UI for warnings where the user can
explore details on demand rather than overwhelming users with information
from the ground up. Artem thinks a static analyzer should have an interface
which is similar to a time travel debugger. David mentioned it would be fun
to have a GDB target so we can use GDB to step through a trace.
David worked alone on GCC's analyzer, and only worked for 2,5 years on it.
But he had a GSoC student this summer to help him out.
Artem mentioned that we do many tuning to eliminate false positives and
that tuning is depending on what projects the devs used to test the
analyzer.
In GCC, originally every entity like regions or symbolic values had ids,
and used those ids to make comparisons easier. Later the ids were replaced
with pointers and the code got simpler and a category of bugs disappeared.
The meaning of regions also changed over time. In GCC 10 things were stored
into the region. In GCC 11 the region is just a description of how to
access memory. Two regions can be different descriptions of the same thing
and there is a separate mapping from regions to values. That made regions
immutable.
We were covering what kinds of symbolic values are there in GCC and Clang.
GCC has different regions for label and function pointers. It is fairly
similar to Clang's, maybe GCC is a bit more fine grained.
David was also maintaining the diagnostic subsystem in GCC and implemented
the caret style diagnostics.


On Mon, 15 Nov 2021 at 13:46, Artem Dergachev <noqnoqneo at gmail.com> wrote:

> It's 9:30 - 10:30 Wednesday (PST / UTC-8), already up on
> https://llvm.swoogo.com/2021devmtg/agenda
>
> On 11/15/21 3:40 AM, Deep Majumder wrote:
>
> Are we having this? If so, when?
>
> On Thu, Oct 28, 2021 at 9:24 PM Gábor Márton <martongabesz at gmail.com>
> wrote:
>
>> + Randell
>>
>> Hi Randell,
>>
>> Thank you for your email, I am forwarding it to the list.
>>
>> >  Is the list set up to block new subscribers from posting until
>> moderators review?
>> I really don't know, maybe someone in the list will know.
>>
>>
>> On Thu, Oct 28, 2021 at 3:41 PM Randell Jesup <rjesup at mozilla.com> wrote:
>>
>>> On 10/26/2021 4:09 AM, Gábor Márton via cfe-dev wrote:
>>>
>>> Hi CSA developers,
>>>
>>> I've submitted a round table request for the upcoming Dev Meeting (Nov
>>> 17-19). Would be great to have a discussion.
>>> Please also invite colleagues who you think might be interested.
>>>
>>>
>>> Hi Gabor.  I recently joined the cfe-dev mailing list, but have been
>>> unable to post to it.  Is the list set up to block new subscribers from
>>> posting until moderators review?   Thanks
>>>
>>>
>>> This is what I was trying to send:
>>>
>>> Subject: Thread-safety analysis rough edges
>>>
>>> I've been working with -Wthread-safety, and have run into a few rough
>>> edges.
>>>
>>> One is RAII unlockers.   As stated in the known limitations, it doesn't
>>> handle an RAII unlocker and gets very confused, leading to follow-on false
>>> positives.   Is the only reasonable solution to simply not annotating the
>>> RAII unlocker class, and live with the analysis being wrong during the
>>> unlocked section?
>>>
>>> Is there any ongoing work to resolve this issue?
>>>
>>> I notice the work done by WebKit to use this functionality to do static
>>> thread assertions (
>>> https://webkit-search.igalia.com/webkit/source/Source/WTF/wtf/ThreadAssertions.h).
>>> There seems to be some value here (witness the shift from lock-centric
>>> names), but examples on how to use it would be good, similar to the mutex.h
>>> in the docs.
>>>
>>> Related: There are a number of usage patterns for Mutexes that don't
>>> lend themselves easily to thread-safety annotations.  An example would be a
>>> item written to from only a single thread, but read from other threads.
>>> The lock must be held to write it, and all off-writer-thread accesses must
>>> lock to access it.  However, on-writer-thread accesses *don't* need to
>>> lock, and will generate false positives.  (There are other Mutex patterns,
>>> like free access until the item is made available to other threads, or
>>> after all other threads are known to have exited, and more, which aren't
>>> easily covered.)
>>>
>>> What's the best way to handle this, other than not adding GUARDED_BY()
>>> or using NO_THREAD_SAFETY_ANALYSIS?  Could we mark an item as requiring one
>>> of a set of capabilities?  (i.e. on the correct thread OR holds the mutex?)
>>>
>>> Thanks,
>>> Randell Jesup, Mozilla
>>>
>>
>> On Thu, Oct 28, 2021 at 12:23 AM Artem Dergachev <noqnoqneo at gmail.com>
>> wrote:
>>
>>> +Andrew and Bruno who attended our tiny cozy static analyzer round table
>>> at the bay area meetup!
>>>
>>> Andrew, you have some notes from that round table, do you think it makes
>>> sense to share them in this mailing list thread?
>>>
>>> On 10/26/21 12:26 PM, Artem Dergachev wrote:
>>> > +Deep because he expressed interest.
>>> >
>>> > Yay! Yes, absolutely, let's have that.
>>> >
>>> > On 10/26/21 1:09 AM, Gábor Márton wrote:
>>> >> Hi CSA developers,
>>> >>
>>> >> I've submitted a round table request for the upcoming Dev Meeting
>>> >> (Nov 17-19). Would be great to have a discussion.
>>> >> Please also invite colleagues who you think might be interested.
>>> >>
>>> >> Thanks,
>>> >> Gabor
>>> >
>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20211117/1ab67691/attachment-0001.html>


More information about the cfe-dev mailing list