[cfe-dev] RFC: Clang Automatic Bug Reporting
dgregor at apple.com
Mon Jul 19 09:35:07 PDT 2010
On Jul 19, 2010, at 8:44 AM, Daniel Dunbar wrote:
> Hi all,
> It would be great if Clang provided better support for users who want to file
> bugs. We frequently have to go through multiple iterations to get all the data
> we need, and we also sometimes run into cases where it is very hard / impossible
> to reproduce a problem locally. The latter problems tend to show up frequently
> with precompiled headers or supplemental files like header maps, where they
> depend very much on the build system and the layout of source on the users
> system -- such tests can be a real pain to analyze right now.
> The following proposal is for new feature work to let Clang automatically
> generate bug reports.
> Frontend / Single-File Focused:
> The goal of this work is to support generating bug reports for parse failures,
> crashes, and trivial miscompiles. It is not designed to support generating
> test cases where a large application is miscompiled, for example. Generally,
> it is designed to support the case where the user runs a single Clang command
> on their system, it doesn't work (crashes, produces obviously invalid output,
> etc.), and they want a Clang developer to be able to reproduce the problem.
> We want people to use it, so it has to be simple and it has to work almost all
> the time.
> Near-perfect Bug Reproduction:
> We want it to be almost guaranteed that the generated bug report reproduces
> the problem. It isn't possible to be perfect, but we would like to get very
> Report Non-Compiler API Bugs:
> Currently, bugs in the compiler are usually easy to reproduce for users who
> know how to generate preprocessed or LLVM IR files. However, bugs in
> other areas of Clang like the libclang interfaces are much harder to
> reproduce. Any solution should address (or help address) this problem.
> Support auto-minimizing / anonymizing test cases:
> This won't happen soon, but I would like any solution to support this in some
> reasonable fashion. This is primarily a nice to have, but it is also important
> because it makes it more likely users will actually bother to submit a test
> case in situations where they are worried about disclosing their source code.
> User Interface
> The Clang driver will get a two new options:
> '--create-test-case PATH'
> This will cause the driver to create a self-contained test case at PATH,
> which contains enough information to reproduce all the actions the compiler
> is taking as much as possible.
> '--replay-test-case PATH'
> This will cause the driver to replay the test case as best as possible. The
> driver will still support additional command line options, so the usual use
> model would be to run '--replay-test-case' to verify the problem reproduces,
> then either fix the problem directly or use additional command line options
> (-E, -###, -emit-llvm, etc.) to isolate / minimize the problem.
At some point, we'll want to support automatic creation of test cases when Clang crashes (e.g., by installing the appropriate signal handler). However, the --create-test-case command-line option will be a huge improvement even before then.
> Conceptually, what we want to capture in the test case is as much of the users
> environment as is required to reproduce the problem. The environment consists of
> a lot of things which might change the compilers behavior: the OS, the hardware,
> the file system, the environment variables, the command line options, the
> behavior of external programs, etc. We obviously cannot package up all of these
> things, but Clang is portable and always a cross compiler, and most bugs can be
> reproduced on different hardware or a different OS (with the right options).
> The implementation is to try and capture each piece of the environment as best
> we can:
> - For the OS and hardware, we will just record the OS and CPU information, and
> when replaying the test case we will use that information instead of the host
> information. This will require a few additional hooks, but should be
This is mainly encoded in the -cc1 command-line options, no?
> - Command line arguments and environment variables can just be saved to the
> test case and restored on replay.
> - For external programs the driver calls like 'as' and 'ld', all we can expect
> to do in general is store the version information for the program, so that
> developers can at least try to replicate the host environment if necessary
> (and if the failure actually depends on the particular version of one of
> those tools, which it usually doesn't).
> - The file system is the main piece we cannot currently deal with. Usually we
> have users give us a preprocessed files to avoid depending on the users file
> system, but this does not always suffice to reproduce problems.
> My plan here is to rework parts of Clang to add support for a "virtual file
> system" which would live under the FileManager API layer. When the driver is
> generating test cases, it would use this interface to keep track of all the
> directories and files that are accessed as part of the compilation, and it
> would serialize all this information (the file metadata and contents) into
> the bug report. When the driver is replaying a test case, it would construct
> a new virtual file system (like a private chroot, essentially) from the bug
> report. This is the main implementation work, described below.
I love the idea of handling this through the virtual file system.
> There will be lots more details to be sorted out, but I wanted to give a heads
> up on the basic approach I am planning on taking, assuming I can find the time
> to work on this. Comments appreciated!
Looks great to me. I'll likely have more comments as the implementation details start getting ironed out.
More information about the cfe-dev