[cfe-dev] RFC: Clang Automatic Bug Reporting

Mon Jul 19 09:35:07 PDT 2010

On Jul 19, 2010, at 8:44 AM, Daniel Dunbar wrote:

> Hi all,
> 
> It would be great if Clang provided better support for users who want to file
> bugs. We frequently have to go through multiple iterations to get all the data
> we need, and we also sometimes run into cases where it is very hard / impossible
> to reproduce a problem locally. The latter problems tend to show up frequently
> with precompiled headers or supplemental files like header maps, where they
> depend very much on the build system and the layout of source on the users
> system -- such tests can be a real pain to analyze right now.
> 
> The following proposal is for new feature work to let Clang automatically
> generate bug reports.
> 
> 
> Goals
> =====
> 
> Frontend / Single-File Focused:
> 
>  The goal of this work is to support generating bug reports for parse failures,
>  crashes, and trivial miscompiles. It is not designed to support generating
>  test cases where a large application is miscompiled, for example. Generally,
>  it is designed to support the case where the user runs a single Clang command
>  on their system, it doesn't work (crashes, produces obviously invalid output,
>  etc.), and they want a Clang developer to be able to reproduce the problem.
> 
> Easy-to-use:
> 
>  We want people to use it, so it has to be simple and it has to work almost all
>  the time.
> 
> Near-perfect Bug Reproduction:
> 
>  We want it to be almost guaranteed that the generated bug report reproduces
>  the problem. It isn't possible to be perfect, but we would like to get very
>  close.
> 
> Report Non-Compiler API Bugs:
> 
>  Currently, bugs in the compiler are usually easy to reproduce for users who
>  know how to generate preprocessed or LLVM IR files. However, bugs in
>  other areas of Clang like the libclang interfaces are much harder to
>  reproduce. Any solution should address (or help address) this problem.
> 
> Support auto-minimizing / anonymizing test cases:
> 
>  This won't happen soon, but I would like any solution to support this in some
>  reasonable fashion. This is primarily a nice to have, but it is also important
>  because it makes it more likely users will actually bother to submit a test
>  case in situations where they are worried about disclosing their source code.
> 
> 
> User Interface
> ==============
> 
> The Clang driver will get a two new options:
> 
> '--create-test-case PATH'
> 
>   This will cause the driver to create a self-contained test case at PATH,
>   which contains enough information to reproduce all the actions the compiler
>   is taking as much as possible.
> 
> '--replay-test-case PATH'
> 
>   This will cause the driver to replay the test case as best as possible. The
>   driver will still support additional command line options, so the usual use
>   model would be to run '--replay-test-case' to verify the problem reproduces,
>   then either fix the problem directly or use additional command line options
>   (-E, -###, -emit-llvm, etc.) to isolate / minimize the problem.

At some point, we'll want to support automatic creation of test cases when Clang crashes (e.g., by installing the appropriate signal handler).  However, the --create-test-case command-line option will be a huge improvement even before then.

> Implementation
> ==============
> 
> Conceptually, what we want to capture in the test case is as much of the users
> environment as is required to reproduce the problem. The environment consists of
> a lot of things which might change the compilers behavior: the OS, the hardware,
> the file system, the environment variables, the command line options, the
> behavior of external programs, etc. We obviously cannot package up all of these
> things, but Clang is portable and always a cross compiler, and most bugs can be
> reproduced on different hardware or a different OS (with the right options).
> 
> The implementation is to try and capture each piece of the environment as best
> we can:
> 
> - For the OS and hardware, we will just record the OS and CPU information, and
>   when replaying the test case we will use that information instead of the host
>   information. This will require a few additional hooks, but should be
>   straightforward.

This is mainly encoded in the -cc1 command-line options, no?

> - Command line arguments and environment variables can just be saved to the
>   test case and restored on replay.
> 
> - For external programs the driver calls like 'as' and 'ld', all we can expect
>   to do in general is store the version information for the program, so that
>   developers can at least try to replicate the host environment if necessary
>   (and if the failure actually depends on the particular version of one of
>   those tools, which it usually doesn't).
> 
> - The file system is the main piece we cannot currently deal with. Usually we
>   have users give us a preprocessed files to avoid depending on the users file
>   system, but this does not always suffice to reproduce problems.
> 
>   My plan here is to rework parts of Clang to add support for a "virtual file
>   system" which would live under the FileManager API layer.  When the driver is
>   generating test cases, it would use this interface to keep track of all the
>   directories and files that are accessed as part of the compilation, and it
>   would serialize all this information (the file metadata and contents) into
>   the bug report. When the driver is replaying a test case, it would construct
>   a new virtual file system (like a private chroot, essentially) from the bug
>   report. This is the main implementation work, described below.

I love the idea of handling this through the virtual file system.

> There will be lots more details to be sorted out, but I wanted to give a heads
> up on the basic approach I am planning on taking, assuming I can find the time
> to work on this. Comments appreciated!

Looks great to me. I'll likely have more comments as the implementation details start getting ironed out.

	- Doug