[cfe-dev] [analyzer] [GSoC 2019] Apply the Clang Static Analyzer to LLVM-based projects drafts

Fri Apr 5 12:37:17 PDT 2019

Mmm, interesting. I definitely wouldn't mind allocating a week for that :)

You might still want to do this in dot files rather than in svg files, 
they are a bit more structured.

On 4/5/19 6:29 AM, Csaba Dabis wrote:
> Hey!
>
> The ExplodedGraph modification is not that difficult. It would be 
> really cool to get rid of the entire .SVN file and convert it to pure 
> .HTML, and yes, it would take months. My solution is more simple and 
> could be done within a week.
>
> Here is the task copied:
>
>     It could modify the full graph to only show differences between
>     states and it would recolour the current representation for better
>     readability.
>
>
> Here is the skeleton of the internal representation of the .SVG:
> <svg>
>   <g id="node1"></g>
>   <g id="node2"></g>
>   <g id="edge1"></g>
>   <g id="node3"></g>
>   <g id="edge2"></g>
>   ...
>   <g id="node667"></g>
>   <g id="edge666"></g>
> </svg>
>
> If `g` is a node, it looks like this:
> <g>
>   <text>Block Entrance: B3</text>
>   <text>PreStmtPurgeDeadSymbols</text>
>   ...
>   <text>reg_$0<const class clang::ento::MemRegion * this> : { 
> [1, 18446744073709551615] }</text>
> </g>
>
> -------------
>
> First idea explained:
> The idea is traversing backwards on this tree and check whether two 
> `texts` are equal in the current and the parent node, if so tag the 
> `text` of the child node being duplication. Put a floating checkbox on 
> the top-left corner and if you click on that it hides the tagged 
> `texts`, second click should show every entry.
>
> Second idea explained:
> The most common interesting string is `reg_$` and possibly `derived_$, 
> extent_$, meta_$`, where the `_$` suffix is not enough eye-catching. I 
> tried to indent them by obtaining the complexity of the symbolic value 
> and apply that much space, but it did not work, so make it colourful 
> could make it better. Both would be the best. Other important string 
> would be determined on the fly.
>
> Thanks you for your comments,
> Csaba.
>
> On Fri, Apr 5, 2019 at 4:15 AM Artem Dergachev <noqnoqneo at gmail.com 
> <mailto:noqnoqneo at gmail.com>> wrote:
>
>     +actual Ravi as he prefers this address. Ravi is this dude -
>     https://github.com/ravikandhadai - he's from the Swift universe
>     and he has a solid academic background in static analysis (unlike
>     me ^_^") and every time i tell him we have a checker for this bug,
>     he gets more and more excited :p
>
>     On 4/4/19 2:26 PM, Csaba Dabis wrote:
>>     Hey Clang developers!
>>
>>     I would like to participate in Google Summer of Code this year. I
>>     am in my fourth semester BSc student of Computer Science at
>>     Eotvos Lorand University, Hungary. I have started to learn C++
>>     parallel with Clang a year and a half ago. Also that was the
>>     first time using Linux, Git, VIM…. I love automation so this
>>     engine and tools based on Clang like scan-build, CodeChecker,
>>     CodeCompass.
>>
>>     I have picked the following project:
>>     http://llvm.org/OpenProjects.html#analyze-llvm
>>     Here is the copy of the problems and their solutions from my
>>     near-finished proposal:
>>
>>     Goals
>>     Eliminate 90% of the false positive findings in LLVM by teaching
>>     C++ to the Static Analyzer. Improve the existing debugging
>>     facilities so it would be easier to investigate errors. Report
>>     and fix the easy-to-fix true positives in LLVM. Report the
>>     difficult-to-fix true positives in LLVM so other developers with
>>     better experience in that certain area could solve those. Swift
>>     is another heavy project as an example to see how an LLVM-related
>>     project reports are changing. Measure the quality of the changes
>>     in Swift where no direct false positive elimination happen. With
>>     these improvements let the LLVM and related project contributors
>>     use the Static Analyzer sub-project without any overhead in a
>>     continuous integration workflow.
>
>     *approves the goals*
>
>>     Overview of the debugging facilities
>>     The Clang Static Analyzer builds the exploded graph which
>>     consists of program states as nodes. During the symbolic
>>     execution each node represents everything what we know about the
>>     program at a certain location.
>>
>>     ExplodedGraph: We could investigate the graph with graphviz as an
>>     .SVN file and using Google Chrome. The graph can be so enormous
>>     so that Chrome crashes or even cannot load it. If you are able to
>>     load it, there is too much information and it is very difficult
>>     to use.
>
>     I mean, our exploded graph dumps are horrible, but i've been a
>     happy user of them for like 5 years. Pretty much every single
>     bugfix that i made so far involved an investigation via exploded
>     graph dumps. And on top of that, i'm not seeing much information
>     that can be removed from them. Yes, viewers are choking
>     immediately, browsers through svg conversion are doing better,
>     especially chrome that seems to have the most tolerant svg
>     library. But whenever there isn't way to view an exploded graph,
>     it becomes 5x times harder to debug anything.
>
>     One of the more personal reasons why i've been rooting for this
>     project is that i wanted to popularize this systematic debugging
>     workflow of narrowing down the bug to the Static Analyzer function
>     in which it's happening that consists binary-searching the
>     exploded graph dump for invalid values and bindings. I guess i
>     should actually document it some day, like, you know, *for once*,
>     'cause
>     http://clang-analyzer.llvm.org/checker_dev_manual.html#visualizing
>     is clearly insufficient.
>
>>     Alternatively you could use LLDB debugger but because of the such
>>     a complex background it is more difficult to gather information
>>     which function causes the false positive.
>>
>>     Debug checkers: debug.DumpCalls checker truly writes out every
>>     function call, which is too much and too difficult to use.
>>     Expression inspection checks[1] are useful for get a feeling what
>>     could go wrong by writing out a certain program state, but it
>>     cannot be used to compare states due to the graph structure.
>>
>>     Proposed solutions for the debugging facilities
>>     ExplodedGraph: Create an .HTML frontend for the .SVG graph
>>     representation. It could modify the full graph to only show
>>     differences between states and it would recolour the current
>>     representation for better readability.
>
>     It'd be awesome to pull this off, but i suspect this undertaking
>     alone might take a few months of your time. I think you should
>     keep your eyes open for potential smaller improvements but
>     generally try to familiarize yourself with existing tools before
>     building up long-term plans in this direction.
>
>>     Debug checkers: Create an option for debug.DumpCalls checker to
>>     show only a certain variable and if its value is unknown at the
>>     location of an error, point out when it became unknown.
>>
>>     Overview of the false positives
>>     My playground was the LLVM 8.0.0 bug-free release (20 March
>>     2019). With the basic scan-build command 828 bug reports found.
>>     Because of our precise review system they are most likely false
>>     positive findings, where the half is ‘Memory leak’ (229) and
>>     ‘Called C++ object pointer is null’ (217) errors:
>>     - ‘Memory leak’: Half of the reports (118/229) appears in Error.h
>>     on the same function call in different variations.
>>     - ‘Called C++ object pointer is null’: Third of the reports
>>     happen on placement new operations.
>
>     Yes! That's what i wanted to hear. Put this on the top :) Great
>     job identifying those top issues!
>
>     I've been noticing placement-new bugs before, but i didn't ever
>     notice leaks in Error.h being a popular FP, nice catch!
>
>>     Proposed solutions for the false positives
>>     One could say creating more assertions could remove the errors
>>     and document the code better. Let think about the opposite:
>>     removing every assertion like ‘assert()’ and ‘LLVM_DEBUG()’[2]
>>     could show the weakness of the Static Analyzer. We cannot force
>>     our users to double or triple the number of assertions (even it
>>     would be very useful). With that, and the new debug-facilities
>>     the door will be open to mitigate the false positives.
>
>     That depends. When suppressing Static Analyzer false positives
>     with assertions, some assertions are great to add anyway as a
>     means of documentation and verification, while others do indeed
>     look like ridiculous false positive suppressions that clearly
>     don't belong here.
>
>     Regardless of having to add an assertion or not, we should anyway
>     in parallel think whether we could have prevented the false
>     positive from happening in the first place.
>
>>     It is impossible to measure how long does it take to eliminate a
>>     false positive.
>
>     *wholeheartedly agrees*
>
>>     If we think about sets of false positives as the two most common
>>     factor is already known, we could define more sets. We have to
>>     start the work from the highest set. The workflow is the
>>     following: pick the most common false positive, if it is
>>     necessary improve the debugging facilities, mitigate the error,
>>     document that to LLVM Bugzilla, inject assertions to problematic
>>     code, repeat.
>
>     Yeah, something like that :)
>
>>
>>     -------------
>>     [1] ExprInspection checks:
>>     https://clang.llvm.org/docs/analyzer/developer-docs/DebugChecks.html#exprinspection-checks
>>     [2] LLVM_DEBUG():
>>     http://llvm.org/docs/ProgrammersManual.html#the-llvm-debug-macro-and-debug-option
>>
>>     -------------
>>
>>     Any feedback would be really appreciated.
>>
>>     Thanks you,
>>     Csaba.
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190405/081c131d/attachment.html>