[cfe-dev] [analyzer] Regression testing for the static analyzer

Thu Jun 11 07:23:02 PDT 2020

Hi everyone,

this thread is mainly for the static analyzer developers, but, of course,
everyone else is most welcome to join and share their view on this topic.

I have a few thoughts, ideas, and a bit of prototyping.

INTRO & MOTIVATION

Quite a big portion of patches need to be checked on real projects as opposed to
syntetic tests we have in the repo.  Right now it all comes to manual testing.
Person has to find at least a couple of projects, build them natively, and check
with the analyzer.  So the first problem that I really want to solve, is to
eliminate all this haste.  It should be dead simple, maybe as simple as running
`lit` tests.

Another point that of interest, is reproducibility.  We, at Apple, regularly
check for difference in results on a set of projects.  I believe that other
parts of the community have similar CI setups.  So, there are situations when we
need to come back to the community with undesired changes, we have to make a
reproducible example.  Even if it is a well-known open-source project, it is not
guaranteed that another developer will be able to get somewhat similar results.
The analyzer is extremely susceptible to differences in the environment.  OS,
its version, and the versions of the libraries installed can change the warnings
that the analyzer produces.  This being said, the second problem that has to be
solved is the stability of results, every developer should get exactly the same
results.

MAIN IDEA

One way to solve both of the aforementioned problems is to use `docker`.  It is
available on Linux, Windows, and MacOS.  It is pretty widespread, so it is quite
probable that developer already has some experience with docker.  It is used for
other parts of the LLVM project.  It is fairly easy to run scripts in docker
and make it seem like they are executed outside of it.

WHAT IS DONE

There is a series of revisions starting from https://reviews.llvm.org/D81571
that make a first working version for this approach.

Short summary of what is there:
  * Info on 15 open-source projects to analyze, most of which are pretty small
  * Dockerfile with fixed versions of dependencies for these projects
  * Python interface that abstracts away user interaction with docker

WHAT DOES IT TAKE TO RUN IT RIGHT NOW

The system has two dependencies: python (2 or 3) and docker.  

Right now the prototype of the system is not feature full, but it supports the
following workflow for testing the new patch for crashes and changes against
master (some options are left off for clarity):

1. Build docker image
./SATest.py docker --build-image

2. Build LLVM in docker
./SATest.py docker -- --build-llvm-only

3. Collect reference results for master
./SATest.py docker -- build -r

4. Make changes to the analyzer

5. Incrementally re-build LLVM in docker
./SATest.py docker -- --build-llvm-only

6. Collect results and compare them with reference
./SATest.py docker -- build --strictness 2

HOW IS IT DIFFERENT FROM OTHER SOLUTIONS

There are two main contestants here: SATestBuild and csa-testbench:

SATestBuild is a set of scripts that already exists in the repo
(clang/utils/analyzer) and is essentially a foundation for the new system.
  + already exists and works
  + lives in the tree
  + doesn't have external dependencies
  - doesn't have a pre-defined set of projects and their dependencies
  - doesn't provide a fast setup good for the newcomers
  - doesn't guarantee stable results on different machines
  - doesn't have benchmarking tools

csa-testbench (https://github.com/Xazax-hun/csa-testbench) is a much richer
in functionality set of scripts.
  + already exists and works
  + has an existing pre-defined set of projects
  + has support for coverage
  + compares various statistics
  + has a nice visualization
  - depends on `CodeChecker` that is not used by all of the analyzer's
    developers and should be installed separately
  - doesn't live in the repo, so it's harder to find
  - the user still has to deal with project dependencies, what makes initial
    setup longer and harder for the newcomers
  - doesn't guarantee stable results on different machines

(I am not a `csa-testbench` user, so please correct me if I'm wrong here)

DIRECTIONS

In this section, I want to cover all the things I want to see in this testing
system.

  * I want it to cover all basic needs of the developer:
      - analyze a bunch of projects and show results
      - compare two given revisions
      - benchmark and compare performance

  * I want all commands to be as simple as possible, e.g.:
      - ./SATest.py docker analyze
      - ./SATest.py docker compare HEAD^1 HEAD
      - ./SATest.py docker benchmark --project ABC
    Try to minimize the number of options and actions required.

  * I want to have a community supported CI bot that will test it.
    We can have current reference results in the master and the bot can check 
    those.  This can help reducing the amount of time spent on testing, as the
    reference results are already there.

  * I want to have a separate Phabricator-friendly output to post results

DISCUSSION

Please tell me what you think about this topic and this particular solution and
help me to answer these questions:

  * Would you use a system like this?

  * Does the proposed solution seem reasonable in this situation?

  * What do you think about the directions?

  * What other features do you want to see in the system?

  * What are the priorities for the project and what is the minimal feature
    scope to start using it?

Thank you for taking your time and reading through this!