[llvm-dev] RFC: Adding "minidump" support to obj2yaml

Wed Mar 6 06:00:08 PST 2019

Hello all,

yesterday I sent an email
<http://lists.llvm.org/pipermail/lldb-dev/2019-March/014811.html> to
lldb-dev proposing a new tool in lldb for yamlization of minidump files.
It's been suggested to me that instead of a new tool it may be better to
add support for that format to obj2yaml instead. Hence, this email. :)

As I expect most people are unfamiliar with this format, I'm going to
start off with a brief introduction.

Minidump is the native "core file" format for windows systems. However,
it is widely used on other systems too. Probably the most popular tools
producing this format are the Google "breakpad" and "crashpad" crash
reporting systems. LLDB has support for this format since 2016, when it
was added as a GSoC project by Dimitar Vlahovski. It currently in active
use and development by several lldb contributors.

The format itself is fairly simple and extensible. The file starts of
with a header containing some basic info and a collection of "streams".
Each stream contains various types of information about the state of the
process at the time when the snapshot (minidump) was taken. This
includes information such as:
- list of loaded modules
- list of threads
- chunks of process memory
- etc.

The problem I'm trying to solve right now is how to write tests for this
functionality. We currently don't have any tool which could create
minidump files from human-readable descriptions of them, so our tests
are relying on checking in opaque binary blobs. This makes reviewing the
changes hard and also complicates creating test cases (real-world
minidumps tend to be large). In other words, we are missing a tool like
yaml2minidump.

=== end of introduction ===

While we could create an lldb tool for converting between minidump and
yaml files, there is some appeal in making everything available from a
single tool (i.e., yaml2obj). The main obstacle to that is that there is
currently no support for parsing these files in llvm, and apart from
yaml2obj, it's not clear to me whether any other llvm tool/project would
benefit from this functionality being available in the main llvm
project. For example tools, like llvm-readelf have support for elf core
files, but this is mostly a byproduct of the fact that elf core files
are similar to elf executables. However, there is no "executable" form
of minidumps.

So I am asking this question: Do you think having minidump parsing code
in llvm is a good idea?

To give you an idea of what this involves, the current minidump parser
in lldb is about 2000 LOC. It's already fairly independent of the rest
of lldb, though it would need to be cleaned up a bit to be up to llvm
standards. My expectation is that the yaml conversion code would add
another 1-2 kLOC.

The natural place for this in llvm would seem to be the Object library,
so I'd propose for this code to be placed there. The thing I'm not sure
about is whether it makes sense to integrate this into the existing
ObjectFile hierarchy. While the minidump "streams" could be represented
as sections, I'm not sure we'd be doing anyone a favour by doing that.
The ObjectFile sections assume they are referring to sections in regular
object files, which have things like relocations, symbol lists, etc., and
minidump streams have none of those. Therefore I'm leaning towards the
option of just implementing this as a standalone MinidumpFile class.
This would be kind of similar to the existing ELFFile class, only there wouldn't
be an ELFObjectFile sitting on top of that.

Please let me know what do you think,
pavel