[cfe-dev] [analyzer] [GSoC 2019] Apply the Clang Static Analyzer to LLVM-based projects drafts

Fri Apr 5 06:29:16 PDT 2019

Hey!

The ExplodedGraph modification is not that difficult. It would be really
cool to get rid of the entire .SVN file and convert it to pure .HTML, and
yes, it would take months. My solution is more simple and could be done
within a week.

Here is the task copied:

> It could modify the full graph to only show differences between states and
> it would recolour the current representation for better readability.

Here is the skeleton of the internal representation of the .SVG:
<svg>
  <g id="node1"></g>
  <g id="node2"></g>
  <g id="edge1"></g>
  <g id="node3"></g>
  <g id="edge2"></g>
  ...
  <g id="node667"></g>
  <g id="edge666"></g>
</svg>

If `g` is a node, it looks like this:
<g>
  <text>Block Entrance: B3</text>
  <text>PreStmtPurgeDeadSymbols</text>
  ...
  <text>reg_$0<const class clang::ento::MemRegion * this> : { [1,
18446744073709551615] }</text>
</g>

-------------

First idea explained:
The idea is traversing backwards on this tree and check whether two `texts`
are equal in the current and the parent node, if so tag the `text` of the
child node being duplication. Put a floating checkbox on the top-left
corner and if you click on that it hides the tagged `texts`, second click
should show every entry.

Second idea explained:
The most common interesting string is `reg_$` and possibly `derived_$,
extent_$, meta_$`, where the `_$` suffix is not enough eye-catching. I
tried to indent them by obtaining the complexity of the symbolic value and
apply that much space, but it did not work, so make it colourful could make
it better. Both would be the best. Other important string would be
determined on the fly.

Thanks you for your comments,
Csaba.

On Fri, Apr 5, 2019 at 4:15 AM Artem Dergachev <noqnoqneo at gmail.com> wrote:

> +actual Ravi as he prefers this address. Ravi is this dude -
> https://github.com/ravikandhadai - he's from the Swift universe and he
> has a solid academic background in static analysis (unlike me ^_^") and
> every time i tell him we have a checker for this bug, he gets more and more
> excited :p
>
> On 4/4/19 2:26 PM, Csaba Dabis wrote:
>
> Hey Clang developers!
>
> I would like to participate in Google Summer of Code this year. I am in my
> fourth semester BSc student of Computer Science at Eotvos Lorand
> University, Hungary. I have started to learn C++ parallel with Clang a year
> and a half ago. Also that was the first time using Linux, Git, VIM…. I love
> automation so this engine and tools based on Clang like scan-build,
> CodeChecker, CodeCompass.
>
> I have picked the following project:
> http://llvm.org/OpenProjects.html#analyze-llvm
> Here is the copy of the problems and their solutions from my near-finished
> proposal:
>
> Goals
> Eliminate 90% of the false positive findings in LLVM by teaching C++ to
> the Static Analyzer. Improve the existing debugging facilities so it would
> be easier to investigate errors. Report and fix the easy-to-fix true
> positives in LLVM. Report the difficult-to-fix true positives in LLVM so
> other developers with better experience in that certain area could solve
> those. Swift is another heavy project as an example to see how an
> LLVM-related project reports are changing. Measure the quality of the
> changes in Swift where no direct false positive elimination happen. With
> these improvements let the LLVM and related project contributors use the
> Static Analyzer sub-project without any overhead in a continuous
> integration workflow.
>
>
> *approves the goals*
>
> Overview of the debugging facilities
> The Clang Static Analyzer builds the exploded graph which consists of
> program states as nodes. During the symbolic execution each node represents
> everything what we know about the program at a certain location.
>
> ExplodedGraph: We could investigate the graph with graphviz as an .SVN
> file and using Google Chrome. The graph can be so enormous so that Chrome
> crashes or even cannot load it. If you are able to load it, there is too
> much information and it is very difficult to use.
>
>
> I mean, our exploded graph dumps are horrible, but i've been a happy user
> of them for like 5 years. Pretty much every single bugfix that i made so
> far involved an investigation via exploded graph dumps. And on top of that,
> i'm not seeing much information that can be removed from them. Yes, viewers
> are choking immediately, browsers through svg conversion are doing better,
> especially chrome that seems to have the most tolerant svg library. But
> whenever there isn't way to view an exploded graph, it becomes 5x times
> harder to debug anything.
>
> One of the more personal reasons why i've been rooting for this project is
> that i wanted to popularize this systematic debugging workflow of narrowing
> down the bug to the Static Analyzer function in which it's happening that
> consists binary-searching the exploded graph dump for invalid values and
> bindings. I guess i should actually document it some day, like, you know,
> *for once*, 'cause
> http://clang-analyzer.llvm.org/checker_dev_manual.html#visualizing is
> clearly insufficient.
>
> Alternatively you could use LLDB debugger but because of the such a
> complex background it is more difficult to gather information which
> function causes the false positive.
>
> Debug checkers: debug.DumpCalls checker truly writes out every function
> call, which is too much and too difficult to use. Expression inspection
> checks[1] are useful for get a feeling what could go wrong by writing out a
> certain program state, but it cannot be used to compare states due to the
> graph structure.
>
> Proposed solutions for the debugging facilities
> ExplodedGraph: Create an .HTML frontend for the .SVG graph representation.
> It could modify the full graph to only show differences between states and
> it would recolour the current representation for better readability.
>
>
> It'd be awesome to pull this off, but i suspect this undertaking alone
> might take a few months of your time. I think you should keep your eyes
> open for potential smaller improvements but generally try to familiarize
> yourself with existing tools before building up long-term plans in this
> direction.
>
> Debug checkers: Create an option for debug.DumpCalls checker to show only
> a certain variable and if its value is unknown at the location of an error,
> point out when it became unknown.
>
> Overview of the false positives
> My playground was the LLVM 8.0.0 bug-free release (20 March 2019). With
> the basic scan-build command 828 bug reports found. Because of our precise
> review system they are most likely false positive findings, where the half
> is ‘Memory leak’ (229) and ‘Called C++ object pointer is null’ (217)
> errors:
> - ‘Memory leak’: Half of the reports (118/229) appears in Error.h on the
> same function call in different variations.
> - ‘Called C++ object pointer is null’: Third of the reports happen on
> placement new operations.
>
>
> Yes! That's what i wanted to hear. Put this on the top :) Great job
> identifying those top issues!
>
> I've been noticing placement-new bugs before, but i didn't ever notice
> leaks in Error.h being a popular FP, nice catch!
>
> Proposed solutions for the false positives
> One could say creating more assertions could remove the errors and
> document the code better. Let think about the opposite: removing every
> assertion like ‘assert()’ and ‘LLVM_DEBUG()’[2] could show the weakness of
> the Static Analyzer. We cannot force our users to double or triple the
> number of assertions (even it would be very useful). With that, and the new
> debug-facilities the door will be open to mitigate the false positives.
>
>
> That depends. When suppressing Static Analyzer false positives with
> assertions, some assertions are great to add anyway as a means of
> documentation and verification, while others do indeed look like ridiculous
> false positive suppressions that clearly don't belong here.
>
> Regardless of having to add an assertion or not, we should anyway in
> parallel think whether we could have prevented the false positive from
> happening in the first place.
>
> It is impossible to measure how long does it take to eliminate a false
> positive.
>
>
> *wholeheartedly agrees*
>
> If we think about sets of false positives as the two most common factor is
> already known, we could define more sets. We have to start the work from
> the highest set. The workflow is the following: pick the most common false
> positive, if it is necessary improve the debugging facilities, mitigate the
> error, document that to LLVM Bugzilla, inject assertions to problematic
> code, repeat.
>
>
> Yeah, something like that :)
>
>
> -------------
> [1] ExprInspection checks:
> https://clang.llvm.org/docs/analyzer/developer-docs/DebugChecks.html#exprinspection-checks
> [2] LLVM_DEBUG():
> http://llvm.org/docs/ProgrammersManual.html#the-llvm-debug-macro-and-debug-option
> -------------
>
> Any feedback would be really appreciated.
>
> Thanks you,
> Csaba.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190405/c442156b/attachment.html>