[lldb-dev] [RFC] LLDB Reproducers

Wed Sep 19 06:50:19 PDT 2018

Hi everyone,

We all know how hard it can be to reproduce an issue or crash in LLDB. There
are a lot of moving parts and subtle differences can easily add up. We want to
make this easier by generating reproducers in LLDB, similar to what clang does
today.

The core idea is as follows: during normal operation we capture whatever
information is needed to recreate the current state of the debugger. When
something goes wrong, this becomes available to the user. Someone else should
then be able to reproduce the same issue with only this data, for example on a
different machine.

It's important to note that we want to replay the debug session from the
reproducer, rather than just recreating the current state. This ensures that we
have access to all the events leading up to the problem, which are usually far
more important than the error state itself.

# High Level Design

Concretely we want to extend LLDB in two ways:

1.  We need to add infrastructure to _generate_ the data necessary for
    reproducing.
2.  We need to add infrastructure to _use_ the data in the reproducer to replay
    the debugging session.

Different parts of LLDB will have different definitions of what data they need
to reproduce their path to the issue. For example, capturing the commands
executed by the user is very different from tracking the dSYM bundles on disk.
Therefore, we propose to have each component deal with its needs in a localized
way. This has the advantage that the functionality can be developed and tested
independently.

## Providers

We'll call a combination of (1) and (2) for a given component a `Provider`. For
example, we'd have an provider for user commands and a provider for dSYM files.
A provider will know how to keep track of its information, how to serialize it
as part of the reproducer as well as how to deserialize it again and use it to
recreate the state of the debugger.

With one exception, the lifetime of the provider coincides with that of the
`SBDebugger`, because that is the scope of what we consider here to be a single
debug session. The exception would be the provider for the global module cache,
because it is shared between multiple debuggers. Although it would be
conceptually straightforward to add a provider for the shared module cache,
this significantly increases the complexity of the reproducer framework because
of its implication on the lifetime and everything related to that.

For now we will ignore this problem which means we will not replay the
construction of the shared module cache but rather build it up during
replaying, as if the current debug session was the first and only one using it.
The impact of doing so is significant, as no issue caused by the shared module
cache will be reproducible, but does not limit reproducing any issue unrelated
to it.

## Reproducer Framework

To coordinate between the data from different components, we'll need to
introduce a global reproducer infrastructure. We have a component responsible
for reproducer generation (the `Generator`) and for using the reproducer (the
`Loader`). They are essentially two ways of looking at the same unit of
repayable work.

The Generator keeps track of its providers and whether or not we need to
generate a reproducer. When a problem occurs, LLDB will request the Generator
to generate a reproducer. When LLDB finishes successfully, the Generator cleans
up anything it might have created during the session. Additionally, the
Generator populates an index, which is part of the reproducer, and used by the
Loader to discover what information is available.

When a reproducer is passed to LLDB, we want to use its data to replay the
debug session. This is coordinated by the Loader. Through the index created by
the Generator, different components know what data (Providers) are available,
and how to use them.

It's important to note that in order to create a complete reproducer, we will
require data from our dependencies (llvm, clang, swift) as well. This means
that either (a) the infrastructure needs to be accessible from our dependencies
or (b) that an API is provided that allows us to query this. We plan to address
this issue when it arises for the respective Generator.

# Components

We have identified a list of minimal components needed to make reproducing
possible. We've divided those into two groups: explicit and implicit inputs.

Explicit inputs are inputs from the user to the debugger.

-   Command line arguments
-   Settings
-   User commands
-   Scripting Bridge API

In addition to the components listed above, LLDB has a bunch of inputs that are
not passed explicitly. It's often these that make reproducing an issue complex.

-   GDB Remote Packets
-   Files containing debug information (object files, dSYM bundles)
-   Clang headers
-   Swift modules

Every component would have its own provider and is free to implement it as it
sees fit. For example, as we expect to have a large number of GDB remote
packets, the provider might choose to write these to disk as they come in,
while the settings can easily be kept in memory until it is decided that we
need to generate a reproducer.

# Concerns, Implications & Risks

## Performance Impact

As the reproducer functionality will have to be always-on, we have to consider
performance implications. As mentioned earlier, the provider gives the freedom
to be implemented in such a way that works best for its respective component.
We'll have to measure to know how big the impact is.

## Privacy

The reproducer might contain sensitive user information. We should make it
clear to the user what kind of data is contained in the reproducer. Initially
we will focus on the LLDB developer community and the people already filing
bugs.

## Versions

Because the reproducer works by replaying a debug session, the versions of the
debugger generating an replaying the session will have to match. Not only is
this important for the serialization format, but more importantly a different
LLDB might ask different questions in a different order.

# Implementation

I've put up a patch (<https://reviews.llvm.org/D50254>) which contains a minimal
implementation of the reproducer framework as well as the GDB remote provider.

It records the GDB packets and writes them to a YAML file (we can switch to a
more performant encoding down the road). When invoking the LLDB driver and
passing the reproducer directory to `--reproducer`, this file is read and a
dummy server replies with the next packet from this file, without talking to
the executable.

It's still pretty rudimentary and only works if you enter the exact same
commands (so the server receives the exact same requests form the client).

The next steps are (in broad strokes):

1.  Capturing the debugged binary.
2.  Record and replay user commands and SB-API calls.
3.  Recording the configuration of the debugger.
4.  Capturing other files used by LLDB.

Please let me know what you think!

Thanks,
Jonas