[cfe-dev] [analyzer] [GSoC 2019] Apply the Clang Static Analyzer to LLVM-based projects drafts

Thu Apr 4 14:49:02 PDT 2019

Before i jump into this: +Ravi because he was also actively expressing 
interest in this project :)

On 4/4/19 2:26 PM, Csaba Dabis wrote:
> Hey Clang developers!
>
> I would like to participate in Google Summer of Code this year. I am 
> in my fourth semester BSc student of Computer Science at Eotvos Lorand 
> University, Hungary. I have started to learn C++ parallel with Clang a 
> year and a half ago. Also that was the first time using Linux, Git, 
> VIM…. I love automation so this engine and tools based on Clang like 
> scan-build, CodeChecker, CodeCompass.
>
> I have picked the following project: 
> http://llvm.org/OpenProjects.html#analyze-llvm
> Here is the copy of the problems and their solutions from my 
> near-finished proposal:
>
> Goals
> Eliminate 90% of the false positive findings in LLVM by teaching C++ 
> to the Static Analyzer. Improve the existing debugging facilities so 
> it would be easier to investigate errors. Report and fix the 
> easy-to-fix true positives in LLVM. Report the difficult-to-fix true 
> positives in LLVM so other developers with better experience in that 
> certain area could solve those. Swift is another heavy project as an 
> example to see how an LLVM-related project reports are changing. 
> Measure the quality of the changes in Swift where no direct false 
> positive elimination happen. With these improvements let the LLVM and 
> related project contributors use the Static Analyzer sub-project 
> without any overhead in a continuous integration workflow.
>
> Overview of the debugging facilities
> The Clang Static Analyzer builds the exploded graph which consists of 
> program states as nodes. During the symbolic execution each node 
> represents everything what we know about the program at a certain 
> location.
>
> ExplodedGraph: We could investigate the graph with graphviz as an .SVN 
> file and using Google Chrome. The graph can be so enormous so that 
> Chrome crashes or even cannot load it. If you are able to load it, 
> there is too much information and it is very difficult to use. 
> Alternatively you could use LLDB debugger but because of the such a 
> complex background it is more difficult to gather information which 
> function causes the false positive.
>
> Debug checkers: debug.DumpCalls checker truly writes out every 
> function call, which is too much and too difficult to use. Expression 
> inspection checks[1] are useful for get a feeling what could go wrong 
> by writing out a certain program state, but it cannot be used to 
> compare states due to the graph structure.
>
> Proposed solutions for the debugging facilities
> ExplodedGraph: Create an .HTML frontend for the .SVG graph 
> representation. It could modify the full graph to only show 
> differences between states and it would recolour the current 
> representation for better readability.
>
> Debug checkers: Create an option for debug.DumpCalls checker to show 
> only a certain variable and if its value is unknown at the location of 
> an error, point out when it became unknown.
>
> Overview of the false positives
> My playground was the LLVM 8.0.0 bug-free release (20 March 2019). 
> With the basic scan-build command 828 bug reports found. Because of 
> our precise review system they are most likely false positive 
> findings, where the half is ‘Memory leak’ (229) and ‘Called C++ object 
> pointer is null’ (217) errors:
> - ‘Memory leak’: Half of the reports (118/229) appears in Error.h on 
> the same function call in different variations.
> - ‘Called C++ object pointer is null’: Third of the reports happen on 
> placement new operations.
>
> Proposed solutions for the false positives
> One could say creating more assertions could remove the errors and 
> document the code better. Let think about the opposite: removing every 
> assertion like ‘assert()’ and ‘LLVM_DEBUG()’[2] could show the 
> weakness of the Static Analyzer. We cannot force our users to double 
> or triple the number of assertions (even it would be very useful). 
> With that, and the new debug-facilities the door will be open to 
> mitigate the false positives.
>
> It is impossible to measure how long does it take to eliminate a false 
> positive. If we think about sets of false positives as the two most 
> common factor is already known, we could define more sets. We have to 
> start the work from the highest set. The workflow is the following: 
> pick the most common false positive, if it is necessary improve the 
> debugging facilities, mitigate the error, document that to LLVM 
> Bugzilla, inject assertions to problematic code, repeat.
>
> -------------
> [1] ExprInspection checks: 
> https://clang.llvm.org/docs/analyzer/developer-docs/DebugChecks.html#exprinspection-checks
> [2] LLVM_DEBUG(): 
> http://llvm.org/docs/ProgrammersManual.html#the-llvm-debug-macro-and-debug-option 
>
> -------------
>
> Any feedback would be really appreciated.
>
> Thanks you,
> Csaba.
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190404/7e7efab6/attachment.html>