[cfe-dev] RFC: Clang Automatic Bug Reporting
clattner at apple.com
Mon Jul 19 10:51:30 PDT 2010
Sounds great to me!
On Jul 19, 2010, at 8:44 AM, Daniel Dunbar wrote:
> Hi all,
> It would be great if Clang provided better support for users who want to file
> bugs. We frequently have to go through multiple iterations to get all the data
> we need, and we also sometimes run into cases where it is very hard / impossible
> to reproduce a problem locally. The latter problems tend to show up frequently
> with precompiled headers or supplemental files like header maps, where they
> depend very much on the build system and the layout of source on the users
> system -- such tests can be a real pain to analyze right now.
> The following proposal is for new feature work to let Clang automatically
> generate bug reports.
> Frontend / Single-File Focused:
> The goal of this work is to support generating bug reports for parse failures,
> crashes, and trivial miscompiles. It is not designed to support generating
> test cases where a large application is miscompiled, for example. Generally,
> it is designed to support the case where the user runs a single Clang command
> on their system, it doesn't work (crashes, produces obviously invalid output,
> etc.), and they want a Clang developer to be able to reproduce the problem.
> We want people to use it, so it has to be simple and it has to work almost all
> the time.
> Near-perfect Bug Reproduction:
> We want it to be almost guaranteed that the generated bug report reproduces
> the problem. It isn't possible to be perfect, but we would like to get very
> Report Non-Compiler API Bugs:
> Currently, bugs in the compiler are usually easy to reproduce for users who
> know how to generate preprocessed or LLVM IR files. However, bugs in
> other areas of Clang like the libclang interfaces are much harder to
> reproduce. Any solution should address (or help address) this problem.
> Support auto-minimizing / anonymizing test cases:
> This won't happen soon, but I would like any solution to support this in some
> reasonable fashion. This is primarily a nice to have, but it is also important
> because it makes it more likely users will actually bother to submit a test
> case in situations where they are worried about disclosing their source code.
> User Interface
> The Clang driver will get a two new options:
> '--create-test-case PATH'
> This will cause the driver to create a self-contained test case at PATH,
> which contains enough information to reproduce all the actions the compiler
> is taking as much as possible.
> '--replay-test-case PATH'
> This will cause the driver to replay the test case as best as possible. The
> driver will still support additional command line options, so the usual use
> model would be to run '--replay-test-case' to verify the problem reproduces,
> then either fix the problem directly or use additional command line options
> (-E, -###, -emit-llvm, etc.) to isolate / minimize the problem.
> Conceptually, what we want to capture in the test case is as much of the users
> environment as is required to reproduce the problem. The environment consists of
> a lot of things which might change the compilers behavior: the OS, the hardware,
> the file system, the environment variables, the command line options, the
> behavior of external programs, etc. We obviously cannot package up all of these
> things, but Clang is portable and always a cross compiler, and most bugs can be
> reproduced on different hardware or a different OS (with the right options).
> The implementation is to try and capture each piece of the environment as best
> we can:
> - For the OS and hardware, we will just record the OS and CPU information, and
> when replaying the test case we will use that information instead of the host
> information. This will require a few additional hooks, but should be
> - Command line arguments and environment variables can just be saved to the
> test case and restored on replay.
> - For external programs the driver calls like 'as' and 'ld', all we can expect
> to do in general is store the version information for the program, so that
> developers can at least try to replicate the host environment if necessary
> (and if the failure actually depends on the particular version of one of
> those tools, which it usually doesn't).
> - The file system is the main piece we cannot currently deal with. Usually we
> have users give us a preprocessed files to avoid depending on the users file
> system, but this does not always suffice to reproduce problems.
> My plan here is to rework parts of Clang to add support for a "virtual file
> system" which would live under the FileManager API layer. When the driver is
> generating test cases, it would use this interface to keep track of all the
> directories and files that are accessed as part of the compilation, and it
> would serialize all this information (the file metadata and contents) into
> the bug report. When the driver is replaying a test case, it would construct
> a new virtual file system (like a private chroot, essentially) from the bug
> report. This is the main implementation work, described below.
> Virtual File System
> My plan is to add a new LLVM interface which abstracts high-level access to the
> file system. This interface will live at the llvm/Support level, and is designed
> to have a thin API -- it won't be a full VFS layer, but rather it will support
> the higher level LLVM file operations, i.e. getting a MemoryBuffer or
> raw_ostream. We will also need additional interfaces to support things like
> stat() or quickly testing file existence. The Support library will provide a
> default implementation of the VFS interface which uses the normal file system. I
> don't have a sketch of the API yet, but I'm confident we can achieve something
> Once the llvm/Support level VFS interface is in place, the Clang
> CompilerInstance object will get a VFS object. I will then refactor all the
> Clang IO access to go through this object. The main piece here is changing the
> FileManager to live on top of the VFS object, but all the other places the
> driver & frontend access files will need to move as well.
> There is also some possibility that once this work is done we can simplify some
> existing interfaces, for example the current file remapping APIs or PTH's stat
> The VFS based approach may seem over-the-top, but there are a couple reasons I
> like this approach as opposed to others:
> - The only real other alternative is to try to make the driver smart enough to
> rewrite and repackage up various local paths when making the test cases
> (preprocessed inputs are not a viable alternative), then use the remapping
> APIs to mimick the users environment. This would be very hard to implement
> correctly, and would be brittle in the face of changes to the frontend.
> - I think this approach is fairly simple to implement. We will need to spend a
> fair amount of time getting the VFS interface right, so that it is performant
> and clean, but otherwise each implementation step should be straightforward.
> - This approach should be very robust in so far as reproducing bugs. Although
> memory layout and other very low-level details will change, the intent is
> that everything in the compiler above the VFS layer should behave exactly as
> it would on the users system, assuming an accurate bug report.
> - The downside of this approach is that bug reports will by default include a
> substantial amount of information. I am ok with this tradeoff, because my
> number one priority is to be able to reproduce the bug. I eventually hope to
> solve this problem by having a tool which, once it has a reproducable bug
> report, will try to weed out the non-essential parts (for example, trying to
> switch to a preprocessed input).
> Future Work
> Once the basic Clang driver features are in place, we should be able to use the
> same infrastructure to generate bug reports from other API entry points (like
> libclang). Most of these will involve just packaging up the basic bug report as
> the driver would, and adding whatever additional metadata is needed to identify
> the API call and any extra metadata (file remapping information, for example).
> At some point, we can also think about writing an independent tool which would
> take a Clang generated bug report and attempt to minimize it. For example, it
> would try to simplify the test steps (i.e., see if it can reproduce with a
> preprocessed input), and eventually could use a Clang based delta tool to
> minimize the input source.
> There will be lots more details to be sorted out, but I wanted to give a heads
> up on the basic approach I am planning on taking, assuming I can find the time
> to work on this. Comments appreciated!
> - Daniel
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
More information about the cfe-dev