[cfe-dev] Proposal: Integrate CodeChecker analyzer infrastructure

Tue Feb 23 02:10:59 PST 2016

Hi,

We would like to add CodeChecker 
(https://github.com/Ericsson/codechecker) analyzer infrastructure.

This is an alternative tool to scan-build with extended functionality.
Some of the main features are: track issues over time, suppress false 
positives, detect new issues by comparing multiple analyzer run results,
view and compare results in a web browser or in the command line. A more 
detailed feature list can be found below (*).
The analyzer infrastructure is built in a way that integrating a new 
analyzer can be easily done.
We are developing a tool which can be used easily by the developers or 
by automated continuous integration tools and view the results from 
multiple analyzers in a common way.
We think it would serve as a good base for displaying and tracking bugs 
that can be detected by the other clang tools such as clang-tidy which 
is already supported.

For example, you can find the analysis results of the LLVM code 3.6.2 
and 3.7.1 here: http://modelserver.inf.elte.hu:5000

Main questions to the community:
0. Does the Clang community like the idea?
1. CodeChecker has some 3rd party dependencies see below (**), are they 
acceptable?
2. Is the community satisfied with the CodeChecker name?

Integration plan:
  0. CodeChecker should use scan-build.py (OSX support) to generate the 
compilation database instead of the current LD_PRELOAD technique
  1. Migrate CodeChecker testing infrastructure to the current LLVM 
testing infrastructure

(*) Most notably it extends the current tool set with the following 
features:
  - stores the result of multiple large analysis run results efficiently 
(opposed to scan-build/scan-view static htmls)
  - run multiple analyzers, currently Clang Static Analyzer and 
Clang-Tidy is supported
  - dynamic web based defect viewer (instead of static html)
  - a SQLite/PostgreSQL based defect storage & management (both are 
optional, results can be shown on standard output in quickcheck mode)
  - update analyzer results only for modified files (depends on the 
build system)
  - compare analysis results (new/resolved/unresolved bugs compared to a 
baseline)
  - filter analysis results (checker name, severity, source file name ...)
  - skip analysis in specific source directories if required
  - suppression of false positives (in config file or in the source)
  - Thrift API based server-client model for storing bugs and viewing 
results.
  - It is possible to connect multiple bug viewers. Currently a 
web-based viewer and a command line viewer are provided.
    (command line client is the recommended way to connect into 
Continuous Integration loops)

Command line examples of usage can be found here: 
https://github.com/Ericsson/codechecker/blob/master/docs/usage.md

CodeChecker supports multiple use cases:
  - Small projects/several source files (quick feedback)
      No database is used, analysis results are shown in on the command 
line only
  - Medium size projects (~500 files)
      Results are stored in SQLite/PostgreSQL database and can be viewed 
from command line or web viewer clients
  - Large size projects (>500 files)
      Results are stored in PostgreSQL database and can be viewed from 
command line or web viewer clients

There are currently discussions about analyzer tool support in multiple 
email threads:

http://clang-developers.42468.n3.nabble.com/Idea-for-better-invoking-static-analysis-via-command-line-td4049670.html
http://clang-developers.42468.n3.nabble.com/Proposal-Integrate-static-analysis-test-suites-td4048967.html

CodeChecker provides solutions for many problems discussed there:

  - Problem: Different analyzers provide different output formats (Clang 
Static Analyzer provides plist/html/command line, Clang-tidy provides 
command line output only)
    Solution: With Codechecker analyzer results from multiple analyzers 
can be viewed in a common way for developers or other tools for further 
result processing.

  - Problem: CC environment variable overwriting by previous scan-build 
version (written in perl) is not always a good solution.
    Solution: Compilation database is generated by CodeChecker 
(currently using the LD_PRELOAD technique, later with scan-build.py for 
OSX support).

  - Problem: Analyzer has multiple command line arguments which could be 
changed by time, the end users should not be affected.
    Solution: CodeChecker hides the clang analyzer specific options from 
the user. Many options are preconfigured. But forwarding options without 
modifications to the analyzers is supported.

  - Problem: Understanding analyzer results might be harder if only 
command line results are available (currently generated static html 
sites do not scale and it is hard to manage).
    Solution: Analysis steps can be viewed in command line with 
quickcheck or in the web viewer (dynamically generated based on the 
database), which can help to understand the analysis results.

(**) 3rd party dependencies for various features:
  - Python 2.7.5 (Python Software Foundation) - required to run CodeChecker
  - SQLAlchemy (MIT) - Python SQL toolkit and Object Relational Mapper, 
for supporting multiple database backends
  - Alembic (MIT) - required for database migration support which is 
only available for PostgreSQL database
  - pg8000 (BSD) or psycopg2 (LGPL) - at least one database connector is 
required for PostgreSQL database support (both are supported)
  - Thrift (Apache v2.0) - cross-language service building framework to 
handle data transfer for report storage and result viewer clients
  - Codemirror (MIT) - view source code in the browser
  - Jsplumb (community edition, MIT) - draw bug paths
  - Marked (BSD) - view documentation for checkers written in markdown 
(generated dynamically)
  - Dojotoolkit (BSD) - main framework for the web UI
  - Highlightjs (BSD) - required for highlighting the source code

For further information check out our GitHub 
(https://github.com/Ericsson/codechecker) page.

Best Regards,
Gyorgy Orban