[cfe-dev] [analyzer] [GSoC 2019] Apply the Clang Static Analyzer to LLVM-based projects drafts

Thu Apr 4 14:26:30 PDT 2019

Hey Clang developers!

I would like to participate in Google Summer of Code this year. I am in my
fourth semester BSc student of Computer Science at Eotvos Lorand
University, Hungary. I have started to learn C++ parallel with Clang a year
and a half ago. Also that was the first time using Linux, Git, VIM…. I love
automation so this engine and tools based on Clang like scan-build,
CodeChecker, CodeCompass.

I have picked the following project:
http://llvm.org/OpenProjects.html#analyze-llvm
Here is the copy of the problems and their solutions from my near-finished
proposal:

Goals
Eliminate 90% of the false positive findings in LLVM by teaching C++ to the
Static Analyzer. Improve the existing debugging facilities so it would be
easier to investigate errors. Report and fix the easy-to-fix true positives
in LLVM. Report the difficult-to-fix true positives in LLVM so other
developers with better experience in that certain area could solve those.
Swift is another heavy project as an example to see how an LLVM-related
project reports are changing. Measure the quality of the changes in Swift
where no direct false positive elimination happen. With these improvements
let the LLVM and related project contributors use the Static Analyzer
sub-project without any overhead in a continuous integration workflow.

Overview of the debugging facilities
The Clang Static Analyzer builds the exploded graph which consists of
program states as nodes. During the symbolic execution each node represents
everything what we know about the program at a certain location.

ExplodedGraph: We could investigate the graph with graphviz as an .SVN file
and using Google Chrome. The graph can be so enormous so that Chrome
crashes or even cannot load it. If you are able to load it, there is too
much information and it is very difficult to use. Alternatively you could
use LLDB debugger but because of the such a complex background it is more
difficult to gather information which function causes the false positive.

Debug checkers: debug.DumpCalls checker truly writes out every function
call, which is too much and too difficult to use. Expression inspection
checks[1] are useful for get a feeling what could go wrong by writing out a
certain program state, but it cannot be used to compare states due to the
graph structure.

Proposed solutions for the debugging facilities
ExplodedGraph: Create an .HTML frontend for the .SVG graph representation.
It could modify the full graph to only show differences between states and
it would recolour the current representation for better readability.

Debug checkers: Create an option for debug.DumpCalls checker to show only a
certain variable and if its value is unknown at the location of an error,
point out when it became unknown.

Overview of the false positives
My playground was the LLVM 8.0.0 bug-free release (20 March 2019). With the
basic scan-build command 828 bug reports found. Because of our precise
review system they are most likely false positive findings, where the half
is ‘Memory leak’ (229) and ‘Called C++ object pointer is null’ (217) errors:
- ‘Memory leak’: Half of the reports (118/229) appears in Error.h on the
same function call in different variations.
- ‘Called C++ object pointer is null’: Third of the reports happen on
placement new operations.

Proposed solutions for the false positives
One could say creating more assertions could remove the errors and document
the code better. Let think about the opposite: removing every assertion
like ‘assert()’ and ‘LLVM_DEBUG()’[2] could show the weakness of the Static
Analyzer. We cannot force our users to double or triple the number of
assertions (even it would be very useful). With that, and the new
debug-facilities the door will be open to mitigate the false positives.

It is impossible to measure how long does it take to eliminate a false
positive. If we think about sets of false positives as the two most common
factor is already known, we could define more sets. We have to start the
work from the highest set. The workflow is the following: pick the most
common false positive, if it is necessary improve the debugging facilities,
mitigate the error, document that to LLVM Bugzilla, inject assertions to
problematic code, repeat.

-------------
[1] ExprInspection checks:
https://clang.llvm.org/docs/analyzer/developer-docs/DebugChecks.html#exprinspection-checks
[2] LLVM_DEBUG():
http://llvm.org/docs/ProgrammersManual.html#the-llvm-debug-macro-and-debug-option
-------------

Any feedback would be really appreciated.

Thanks you,
Csaba.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190404/de26f0f8/attachment.html>