[cfe-dev] [analyzer] [GSoC 2019] Apply the Clang Static Analyzer to LLVM-based projects drafts

Fri Apr 5 15:08:50 PDT 2019

Hmm, I think about the opposite. A language that anyone could improve and
easy-to-read in a tree structure versus an unknown language (in my point of
view) of a raw graph written for the computer.

On Fri, Apr 5, 2019 at 9:37 PM Artem Dergachev <noqnoqneo at gmail.com> wrote:

> Mmm, interesting. I definitely wouldn't mind allocating a week for that :)
>
> You might still want to do this in dot files rather than in svg files,
> they are a bit more structured.
>
> On 4/5/19 6:29 AM, Csaba Dabis wrote:
>
> Hey!
>
> The ExplodedGraph modification is not that difficult. It would be really
> cool to get rid of the entire .SVN file and convert it to pure .HTML, and
> yes, it would take months. My solution is more simple and could be done
> within a week.
>
> Here is the task copied:
>
>> It could modify the full graph to only show differences between states
>> and it would recolour the current representation for better readability.
>
>
> Here is the skeleton of the internal representation of the .SVG:
> <svg>
>   <g id="node1"></g>
>   <g id="node2"></g>
>   <g id="edge1"></g>
>   <g id="node3"></g>
>   <g id="edge2"></g>
>   ...
>   <g id="node667"></g>
>   <g id="edge666"></g>
> </svg>
>
> If `g` is a node, it looks like this:
> <g>
>   <text>Block Entrance: B3</text>
>   <text>PreStmtPurgeDeadSymbols</text>
>   ...
>   <text>reg_$0<const class clang::ento::MemRegion * this> : { [1,
> 18446744073709551615] }</text>
> </g>
>
> -------------
>
> First idea explained:
> The idea is traversing backwards on this tree and check whether two
> `texts` are equal in the current and the parent node, if so tag the `text`
> of the child node being duplication. Put a floating checkbox on the
> top-left corner and if you click on that it hides the tagged `texts`,
> second click should show every entry.
>
> Second idea explained:
> The most common interesting string is `reg_$` and possibly `derived_$,
> extent_$, meta_$`, where the `_$` suffix is not enough eye-catching. I
> tried to indent them by obtaining the complexity of the symbolic value and
> apply that much space, but it did not work, so make it colourful could make
> it better. Both would be the best. Other important string would be
> determined on the fly.
>
> Thanks you for your comments,
> Csaba.
>
> On Fri, Apr 5, 2019 at 4:15 AM Artem Dergachev <noqnoqneo at gmail.com>
> wrote:
>
>> +actual Ravi as he prefers this address. Ravi is this dude -
>> https://github.com/ravikandhadai - he's from the Swift universe and he
>> has a solid academic background in static analysis (unlike me ^_^") and
>> every time i tell him we have a checker for this bug, he gets more and more
>> excited :p
>>
>> On 4/4/19 2:26 PM, Csaba Dabis wrote:
>>
>> Hey Clang developers!
>>
>> I would like to participate in Google Summer of Code this year. I am in
>> my fourth semester BSc student of Computer Science at Eotvos Lorand
>> University, Hungary. I have started to learn C++ parallel with Clang a year
>> and a half ago. Also that was the first time using Linux, Git, VIM…. I love
>> automation so this engine and tools based on Clang like scan-build,
>> CodeChecker, CodeCompass.
>>
>> I have picked the following project:
>> http://llvm.org/OpenProjects.html#analyze-llvm
>> Here is the copy of the problems and their solutions from my
>> near-finished proposal:
>>
>> Goals
>> Eliminate 90% of the false positive findings in LLVM by teaching C++ to
>> the Static Analyzer. Improve the existing debugging facilities so it would
>> be easier to investigate errors. Report and fix the easy-to-fix true
>> positives in LLVM. Report the difficult-to-fix true positives in LLVM so
>> other developers with better experience in that certain area could solve
>> those. Swift is another heavy project as an example to see how an
>> LLVM-related project reports are changing. Measure the quality of the
>> changes in Swift where no direct false positive elimination happen. With
>> these improvements let the LLVM and related project contributors use the
>> Static Analyzer sub-project without any overhead in a continuous
>> integration workflow.
>>
>>
>> *approves the goals*
>>
>> Overview of the debugging facilities
>> The Clang Static Analyzer builds the exploded graph which consists of
>> program states as nodes. During the symbolic execution each node represents
>> everything what we know about the program at a certain location.
>>
>> ExplodedGraph: We could investigate the graph with graphviz as an .SVN
>> file and using Google Chrome. The graph can be so enormous so that Chrome
>> crashes or even cannot load it. If you are able to load it, there is too
>> much information and it is very difficult to use.
>>
>>
>> I mean, our exploded graph dumps are horrible, but i've been a happy user
>> of them for like 5 years. Pretty much every single bugfix that i made so
>> far involved an investigation via exploded graph dumps. And on top of that,
>> i'm not seeing much information that can be removed from them. Yes, viewers
>> are choking immediately, browsers through svg conversion are doing better,
>> especially chrome that seems to have the most tolerant svg library. But
>> whenever there isn't way to view an exploded graph, it becomes 5x times
>> harder to debug anything.
>>
>> One of the more personal reasons why i've been rooting for this project
>> is that i wanted to popularize this systematic debugging workflow of
>> narrowing down the bug to the Static Analyzer function in which it's
>> happening that consists binary-searching the exploded graph dump for
>> invalid values and bindings. I guess i should actually document it some
>> day, like, you know, *for once*, 'cause
>> http://clang-analyzer.llvm.org/checker_dev_manual.html#visualizing is
>> clearly insufficient.
>>
>> Alternatively you could use LLDB debugger but because of the such a
>> complex background it is more difficult to gather information which
>> function causes the false positive.
>>
>> Debug checkers: debug.DumpCalls checker truly writes out every function
>> call, which is too much and too difficult to use. Expression inspection
>> checks[1] are useful for get a feeling what could go wrong by writing out a
>> certain program state, but it cannot be used to compare states due to the
>> graph structure.
>>
>> Proposed solutions for the debugging facilities
>> ExplodedGraph: Create an .HTML frontend for the .SVG graph
>> representation. It could modify the full graph to only show differences
>> between states and it would recolour the current representation for better
>> readability.
>>
>>
>> It'd be awesome to pull this off, but i suspect this undertaking alone
>> might take a few months of your time. I think you should keep your eyes
>> open for potential smaller improvements but generally try to familiarize
>> yourself with existing tools before building up long-term plans in this
>> direction.
>>
>> Debug checkers: Create an option for debug.DumpCalls checker to show only
>> a certain variable and if its value is unknown at the location of an error,
>> point out when it became unknown.
>>
>> Overview of the false positives
>> My playground was the LLVM 8.0.0 bug-free release (20 March 2019). With
>> the basic scan-build command 828 bug reports found. Because of our precise
>> review system they are most likely false positive findings, where the half
>> is ‘Memory leak’ (229) and ‘Called C++ object pointer is null’ (217)
>> errors:
>> - ‘Memory leak’: Half of the reports (118/229) appears in Error.h on the
>> same function call in different variations.
>> - ‘Called C++ object pointer is null’: Third of the reports happen on
>> placement new operations.
>>
>>
>> Yes! That's what i wanted to hear. Put this on the top :) Great job
>> identifying those top issues!
>>
>> I've been noticing placement-new bugs before, but i didn't ever notice
>> leaks in Error.h being a popular FP, nice catch!
>>
>> Proposed solutions for the false positives
>> One could say creating more assertions could remove the errors and
>> document the code better. Let think about the opposite: removing every
>> assertion like ‘assert()’ and ‘LLVM_DEBUG()’[2] could show the weakness of
>> the Static Analyzer. We cannot force our users to double or triple the
>> number of assertions (even it would be very useful). With that, and the new
>> debug-facilities the door will be open to mitigate the false positives.
>>
>>
>> That depends. When suppressing Static Analyzer false positives with
>> assertions, some assertions are great to add anyway as a means of
>> documentation and verification, while others do indeed look like ridiculous
>> false positive suppressions that clearly don't belong here.
>>
>> Regardless of having to add an assertion or not, we should anyway in
>> parallel think whether we could have prevented the false positive from
>> happening in the first place.
>>
>> It is impossible to measure how long does it take to eliminate a false
>> positive.
>>
>>
>> *wholeheartedly agrees*
>>
>> If we think about sets of false positives as the two most common factor
>> is already known, we could define more sets. We have to start the work from
>> the highest set. The workflow is the following: pick the most common false
>> positive, if it is necessary improve the debugging facilities, mitigate the
>> error, document that to LLVM Bugzilla, inject assertions to problematic
>> code, repeat.
>>
>>
>> Yeah, something like that :)
>>
>>
>> -------------
>> [1] ExprInspection checks:
>> https://clang.llvm.org/docs/analyzer/developer-docs/DebugChecks.html#exprinspection-checks
>> [2] LLVM_DEBUG():
>> http://llvm.org/docs/ProgrammersManual.html#the-llvm-debug-macro-and-debug-option
>> -------------
>>
>> Any feedback would be really appreciated.
>>
>> Thanks you,
>> Csaba.
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190406/1a510d0e/attachment.html>