[cfe-dev] [analyzer] [GSoC 2019] Apply the Clang Static Analyzer to LLVM-based projects drafts
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Thu Apr 4 14:49:02 PDT 2019
Before i jump into this: +Ravi because he was also actively expressing
interest in this project :)
On 4/4/19 2:26 PM, Csaba Dabis wrote:
> Hey Clang developers!
> I would like to participate in Google Summer of Code this year. I am
> in my fourth semester BSc student of Computer Science at Eotvos Lorand
> University, Hungary. I have started to learn C++ parallel with Clang a
> year and a half ago. Also that was the first time using Linux, Git,
> VIM…. I love automation so this engine and tools based on Clang like
> scan-build, CodeChecker, CodeCompass.
> I have picked the following project:
> Here is the copy of the problems and their solutions from my
> near-finished proposal:
> Eliminate 90% of the false positive findings in LLVM by teaching C++
> to the Static Analyzer. Improve the existing debugging facilities so
> it would be easier to investigate errors. Report and fix the
> easy-to-fix true positives in LLVM. Report the difficult-to-fix true
> positives in LLVM so other developers with better experience in that
> certain area could solve those. Swift is another heavy project as an
> example to see how an LLVM-related project reports are changing.
> Measure the quality of the changes in Swift where no direct false
> positive elimination happen. With these improvements let the LLVM and
> related project contributors use the Static Analyzer sub-project
> without any overhead in a continuous integration workflow.
> Overview of the debugging facilities
> The Clang Static Analyzer builds the exploded graph which consists of
> program states as nodes. During the symbolic execution each node
> represents everything what we know about the program at a certain
> ExplodedGraph: We could investigate the graph with graphviz as an .SVN
> file and using Google Chrome. The graph can be so enormous so that
> Chrome crashes or even cannot load it. If you are able to load it,
> there is too much information and it is very difficult to use.
> Alternatively you could use LLDB debugger but because of the such a
> complex background it is more difficult to gather information which
> function causes the false positive.
> Debug checkers: debug.DumpCalls checker truly writes out every
> function call, which is too much and too difficult to use. Expression
> inspection checks are useful for get a feeling what could go wrong
> by writing out a certain program state, but it cannot be used to
> compare states due to the graph structure.
> Proposed solutions for the debugging facilities
> ExplodedGraph: Create an .HTML frontend for the .SVG graph
> representation. It could modify the full graph to only show
> differences between states and it would recolour the current
> representation for better readability.
> Debug checkers: Create an option for debug.DumpCalls checker to show
> only a certain variable and if its value is unknown at the location of
> an error, point out when it became unknown.
> Overview of the false positives
> My playground was the LLVM 8.0.0 bug-free release (20 March 2019).
> With the basic scan-build command 828 bug reports found. Because of
> our precise review system they are most likely false positive
> findings, where the half is ‘Memory leak’ (229) and ‘Called C++ object
> pointer is null’ (217) errors:
> - ‘Memory leak’: Half of the reports (118/229) appears in Error.h on
> the same function call in different variations.
> - ‘Called C++ object pointer is null’: Third of the reports happen on
> placement new operations.
> Proposed solutions for the false positives
> One could say creating more assertions could remove the errors and
> document the code better. Let think about the opposite: removing every
> assertion like ‘assert()’ and ‘LLVM_DEBUG()’ could show the
> weakness of the Static Analyzer. We cannot force our users to double
> or triple the number of assertions (even it would be very useful).
> With that, and the new debug-facilities the door will be open to
> mitigate the false positives.
> It is impossible to measure how long does it take to eliminate a false
> positive. If we think about sets of false positives as the two most
> common factor is already known, we could define more sets. We have to
> start the work from the highest set. The workflow is the following:
> pick the most common false positive, if it is necessary improve the
> debugging facilities, mitigate the error, document that to LLVM
> Bugzilla, inject assertions to problematic code, repeat.
>  ExprInspection checks:
>  LLVM_DEBUG():
> Any feedback would be really appreciated.
> Thanks you,
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the cfe-dev