[cfe-dev] [analyzer] [GSoC 2019] Apply the Clang Static Analyzer to LLVM-based projects drafts
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Fri Apr 5 12:37:17 PDT 2019
Mmm, interesting. I definitely wouldn't mind allocating a week for that :)
You might still want to do this in dot files rather than in svg files,
they are a bit more structured.
On 4/5/19 6:29 AM, Csaba Dabis wrote:
> Hey!
>
> The ExplodedGraph modification is not that difficult. It would be
> really cool to get rid of the entire .SVN file and convert it to pure
> .HTML, and yes, it would take months. My solution is more simple and
> could be done within a week.
>
> Here is the task copied:
>
> It could modify the full graph to only show differences between
> states and it would recolour the current representation for better
> readability.
>
>
> Here is the skeleton of the internal representation of the .SVG:
> <svg>
> <g id="node1"></g>
> <g id="node2"></g>
> <g id="edge1"></g>
> <g id="node3"></g>
> <g id="edge2"></g>
> ...
> <g id="node667"></g>
> <g id="edge666"></g>
> </svg>
>
> If `g` is a node, it looks like this:
> <g>
> <text>Block Entrance: B3</text>
> <text>PreStmtPurgeDeadSymbols</text>
> ...
> <text>reg_$0<const class clang::ento::MemRegion * this> : {
> [1, 18446744073709551615] }</text>
> </g>
>
> -------------
>
> First idea explained:
> The idea is traversing backwards on this tree and check whether two
> `texts` are equal in the current and the parent node, if so tag the
> `text` of the child node being duplication. Put a floating checkbox on
> the top-left corner and if you click on that it hides the tagged
> `texts`, second click should show every entry.
>
> Second idea explained:
> The most common interesting string is `reg_$` and possibly `derived_$,
> extent_$, meta_$`, where the `_$` suffix is not enough eye-catching. I
> tried to indent them by obtaining the complexity of the symbolic value
> and apply that much space, but it did not work, so make it colourful
> could make it better. Both would be the best. Other important string
> would be determined on the fly.
>
> Thanks you for your comments,
> Csaba.
>
> On Fri, Apr 5, 2019 at 4:15 AM Artem Dergachev <noqnoqneo at gmail.com
> <mailto:noqnoqneo at gmail.com>> wrote:
>
> +actual Ravi as he prefers this address. Ravi is this dude -
> https://github.com/ravikandhadai - he's from the Swift universe
> and he has a solid academic background in static analysis (unlike
> me ^_^") and every time i tell him we have a checker for this bug,
> he gets more and more excited :p
>
> On 4/4/19 2:26 PM, Csaba Dabis wrote:
>> Hey Clang developers!
>>
>> I would like to participate in Google Summer of Code this year. I
>> am in my fourth semester BSc student of Computer Science at
>> Eotvos Lorand University, Hungary. I have started to learn C++
>> parallel with Clang a year and a half ago. Also that was the
>> first time using Linux, Git, VIM…. I love automation so this
>> engine and tools based on Clang like scan-build, CodeChecker,
>> CodeCompass.
>>
>> I have picked the following project:
>> http://llvm.org/OpenProjects.html#analyze-llvm
>> Here is the copy of the problems and their solutions from my
>> near-finished proposal:
>>
>> Goals
>> Eliminate 90% of the false positive findings in LLVM by teaching
>> C++ to the Static Analyzer. Improve the existing debugging
>> facilities so it would be easier to investigate errors. Report
>> and fix the easy-to-fix true positives in LLVM. Report the
>> difficult-to-fix true positives in LLVM so other developers with
>> better experience in that certain area could solve those. Swift
>> is another heavy project as an example to see how an LLVM-related
>> project reports are changing. Measure the quality of the changes
>> in Swift where no direct false positive elimination happen. With
>> these improvements let the LLVM and related project contributors
>> use the Static Analyzer sub-project without any overhead in a
>> continuous integration workflow.
>
> *approves the goals*
>
>> Overview of the debugging facilities
>> The Clang Static Analyzer builds the exploded graph which
>> consists of program states as nodes. During the symbolic
>> execution each node represents everything what we know about the
>> program at a certain location.
>>
>> ExplodedGraph: We could investigate the graph with graphviz as an
>> .SVN file and using Google Chrome. The graph can be so enormous
>> so that Chrome crashes or even cannot load it. If you are able to
>> load it, there is too much information and it is very difficult
>> to use.
>
> I mean, our exploded graph dumps are horrible, but i've been a
> happy user of them for like 5 years. Pretty much every single
> bugfix that i made so far involved an investigation via exploded
> graph dumps. And on top of that, i'm not seeing much information
> that can be removed from them. Yes, viewers are choking
> immediately, browsers through svg conversion are doing better,
> especially chrome that seems to have the most tolerant svg
> library. But whenever there isn't way to view an exploded graph,
> it becomes 5x times harder to debug anything.
>
> One of the more personal reasons why i've been rooting for this
> project is that i wanted to popularize this systematic debugging
> workflow of narrowing down the bug to the Static Analyzer function
> in which it's happening that consists binary-searching the
> exploded graph dump for invalid values and bindings. I guess i
> should actually document it some day, like, you know, *for once*,
> 'cause
> http://clang-analyzer.llvm.org/checker_dev_manual.html#visualizing
> is clearly insufficient.
>
>> Alternatively you could use LLDB debugger but because of the such
>> a complex background it is more difficult to gather information
>> which function causes the false positive.
>>
>> Debug checkers: debug.DumpCalls checker truly writes out every
>> function call, which is too much and too difficult to use.
>> Expression inspection checks[1] are useful for get a feeling what
>> could go wrong by writing out a certain program state, but it
>> cannot be used to compare states due to the graph structure.
>>
>> Proposed solutions for the debugging facilities
>> ExplodedGraph: Create an .HTML frontend for the .SVG graph
>> representation. It could modify the full graph to only show
>> differences between states and it would recolour the current
>> representation for better readability.
>
> It'd be awesome to pull this off, but i suspect this undertaking
> alone might take a few months of your time. I think you should
> keep your eyes open for potential smaller improvements but
> generally try to familiarize yourself with existing tools before
> building up long-term plans in this direction.
>
>> Debug checkers: Create an option for debug.DumpCalls checker to
>> show only a certain variable and if its value is unknown at the
>> location of an error, point out when it became unknown.
>>
>> Overview of the false positives
>> My playground was the LLVM 8.0.0 bug-free release (20 March
>> 2019). With the basic scan-build command 828 bug reports found.
>> Because of our precise review system they are most likely false
>> positive findings, where the half is ‘Memory leak’ (229) and
>> ‘Called C++ object pointer is null’ (217) errors:
>> - ‘Memory leak’: Half of the reports (118/229) appears in Error.h
>> on the same function call in different variations.
>> - ‘Called C++ object pointer is null’: Third of the reports
>> happen on placement new operations.
>
> Yes! That's what i wanted to hear. Put this on the top :) Great
> job identifying those top issues!
>
> I've been noticing placement-new bugs before, but i didn't ever
> notice leaks in Error.h being a popular FP, nice catch!
>
>> Proposed solutions for the false positives
>> One could say creating more assertions could remove the errors
>> and document the code better. Let think about the opposite:
>> removing every assertion like ‘assert()’ and ‘LLVM_DEBUG()’[2]
>> could show the weakness of the Static Analyzer. We cannot force
>> our users to double or triple the number of assertions (even it
>> would be very useful). With that, and the new debug-facilities
>> the door will be open to mitigate the false positives.
>
> That depends. When suppressing Static Analyzer false positives
> with assertions, some assertions are great to add anyway as a
> means of documentation and verification, while others do indeed
> look like ridiculous false positive suppressions that clearly
> don't belong here.
>
> Regardless of having to add an assertion or not, we should anyway
> in parallel think whether we could have prevented the false
> positive from happening in the first place.
>
>> It is impossible to measure how long does it take to eliminate a
>> false positive.
>
> *wholeheartedly agrees*
>
>> If we think about sets of false positives as the two most common
>> factor is already known, we could define more sets. We have to
>> start the work from the highest set. The workflow is the
>> following: pick the most common false positive, if it is
>> necessary improve the debugging facilities, mitigate the error,
>> document that to LLVM Bugzilla, inject assertions to problematic
>> code, repeat.
>
> Yeah, something like that :)
>
>>
>> -------------
>> [1] ExprInspection checks:
>> https://clang.llvm.org/docs/analyzer/developer-docs/DebugChecks.html#exprinspection-checks
>> [2] LLVM_DEBUG():
>> http://llvm.org/docs/ProgrammersManual.html#the-llvm-debug-macro-and-debug-option
>>
>> -------------
>>
>> Any feedback would be really appreciated.
>>
>> Thanks you,
>> Csaba.
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190405/081c131d/attachment.html>
More information about the cfe-dev
mailing list