<div dir="ltr">Sounds like a fantastic idea. <div><br></div><div>How would this work when the behavior of the debugee process is non-deterministic?</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Sep 19, 2018 at 6:50 AM, Jonas Devlieghere via lldb-dev <span dir="ltr"><<a href="mailto:lldb-dev@lists.llvm.org" target="_blank">lldb-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi everyone,<br>

<br>

We all know how hard it can be to reproduce an issue or crash in LLDB. There<br>

are a lot of moving parts and subtle differences can easily add up. We want to<br>

make this easier by generating reproducers in LLDB, similar to what clang does<br>

today.<br>

<br>

The core idea is as follows: during normal operation we capture whatever<br>

information is needed to recreate the current state of the debugger. When<br>

something goes wrong, this becomes available to the user. Someone else should<br>

then be able to reproduce the same issue with only this data, for example on a<br>

different machine.<br>

<br>

It's important to note that we want to replay the debug session from the<br>

reproducer, rather than just recreating the current state. This ensures that we<br>

have access to all the events leading up to the problem, which are usually far<br>

more important than the error state itself.<br>

<br>

# High Level Design<br>

<br>

Concretely we want to extend LLDB in two ways:<br>

<br>

1.  We need to add infrastructure to _generate_ the data necessary for<br>

    reproducing.<br>

2.  We need to add infrastructure to _use_ the data in the reproducer to replay<br>

    the debugging session.<br>

<br>

Different parts of LLDB will have different definitions of what data they need<br>

to reproduce their path to the issue. For example, capturing the commands<br>

executed by the user is very different from tracking the dSYM bundles on disk.<br>

Therefore, we propose to have each component deal with its needs in a localized<br>

way. This has the advantage that the functionality can be developed and tested<br>

independently.<br>

<br>

## Providers<br>

<br>

We'll call a combination of (1) and (2) for a given component a `Provider`. For<br>

example, we'd have an provider for user commands and a provider for dSYM files.<br>

A provider will know how to keep track of its information, how to serialize it<br>

as part of the reproducer as well as how to deserialize it again and use it to<br>

recreate the state of the debugger.<br>

<br>

With one exception, the lifetime of the provider coincides with that of the<br>

`SBDebugger`, because that is the scope of what we consider here to be a single<br>

debug session. The exception would be the provider for the global module cache,<br>

because it is shared between multiple debuggers. Although it would be<br>

conceptually straightforward to add a provider for the shared module cache,<br>

this significantly increases the complexity of the reproducer framework because<br>

of its implication on the lifetime and everything related to that.<br>

<br>

For now we will ignore this problem which means we will not replay the<br>

construction of the shared module cache but rather build it up during<br>

replaying, as if the current debug session was the first and only one using it.<br>

The impact of doing so is significant, as no issue caused by the shared module<br>

cache will be reproducible, but does not limit reproducing any issue unrelated<br>

to it.<br>

<br>

## Reproducer Framework<br>

<br>

To coordinate between the data from different components, we'll need to<br>

introduce a global reproducer infrastructure. We have a component responsible<br>

for reproducer generation (the `Generator`) and for using the reproducer (the<br>

`Loader`). They are essentially two ways of looking at the same unit of<br>

repayable work.<br>

<br>

The Generator keeps track of its providers and whether or not we need to<br>

generate a reproducer. When a problem occurs, LLDB will request the Generator<br>

to generate a reproducer. When LLDB finishes successfully, the Generator cleans<br>

up anything it might have created during the session. Additionally, the<br>

Generator populates an index, which is part of the reproducer, and used by the<br>

Loader to discover what information is available.<br>

<br>

When a reproducer is passed to LLDB, we want to use its data to replay the<br>

debug session. This is coordinated by the Loader. Through the index created by<br>

the Generator, different components know what data (Providers) are available,<br>

and how to use them.<br>

<br>

It's important to note that in order to create a complete reproducer, we will<br>

require data from our dependencies (llvm, clang, swift) as well. This means<br>

that either (a) the infrastructure needs to be accessible from our dependencies<br>

or (b) that an API is provided that allows us to query this. We plan to address<br>

this issue when it arises for the respective Generator.<br>

<br>

# Components<br>

<br>

We have identified a list of minimal components needed to make reproducing<br>

possible. We've divided those into two groups: explicit and implicit inputs.<br>

<br>

Explicit inputs are inputs from the user to the debugger.<br>

<br>

-   Command line arguments<br>

-   Settings<br>

-   User commands<br>

-   Scripting Bridge API<br>

<br>

In addition to the components listed above, LLDB has a bunch of inputs that are<br>

not passed explicitly. It's often these that make reproducing an issue complex.<br>

<br>

-   GDB Remote Packets<br>

-   Files containing debug information (object files, dSYM bundles)<br>

-   Clang headers<br>

-   Swift modules<br>

<br>

Every component would have its own provider and is free to implement it as it<br>

sees fit. For example, as we expect to have a large number of GDB remote<br>

packets, the provider might choose to write these to disk as they come in,<br>

while the settings can easily be kept in memory until it is decided that we<br>

need to generate a reproducer.<br>

<br>

# Concerns, Implications & Risks<br>

<br>

## Performance Impact<br>

<br>

As the reproducer functionality will have to be always-on, we have to consider<br>

performance implications. As mentioned earlier, the provider gives the freedom<br>

to be implemented in such a way that works best for its respective component.<br>

We'll have to measure to know how big the impact is.<br>

<br>

## Privacy<br>

<br>

The reproducer might contain sensitive user information. We should make it<br>

clear to the user what kind of data is contained in the reproducer. Initially<br>

we will focus on the LLDB developer community and the people already filing<br>

bugs.<br>

<br>

## Versions<br>

<br>

Because the reproducer works by replaying a debug session, the versions of the<br>

debugger generating an replaying the session will have to match. Not only is<br>

this important for the serialization format, but more importantly a different<br>

LLDB might ask different questions in a different order.<br>

<br>

# Implementation<br>

<br>

I've put up a patch (<<a href="https://reviews.llvm.org/D50254" rel="noreferrer" target="_blank">https://reviews.llvm.org/<wbr>D50254</a>>) which contains a minimal<br>

implementation of the reproducer framework as well as the GDB remote provider.<br>

<br>

It records the GDB packets and writes them to a YAML file (we can switch to a<br>

more performant encoding down the road). When invoking the LLDB driver and<br>

passing the reproducer directory to `--reproducer`, this file is read and a<br>

dummy server replies with the next packet from this file, without talking to<br>

the executable.<br>

<br>

It's still pretty rudimentary and only works if you enter the exact same<br>

commands (so the server receives the exact same requests form the client).<br>

<br>

The next steps are (in broad strokes):<br>

<br>

1.  Capturing the debugged binary.<br>

2.  Record and replay user commands and SB-API calls.<br>

3.  Recording the configuration of the debugger.<br>

4.  Capturing other files used by LLDB.<br>

<br>

Please let me know what you think!<br>

<br>

Thanks,<br>

Jonas <br>

______________________________<wbr>_________________<br>

lldb-dev mailing list<br>

<a href="mailto:lldb-dev@lists.llvm.org">lldb-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/lldb-dev</a><br>

</blockquote></div><br></div>