[LLVMdev] RFC: Improving our DWARF (and ELF) emission testing capabilities

Fri Jan 18 13:00:30 PST 2013

Hi All,

While working on some recent patches for x32 support, I ran into an
unpleasant limitation the LLVM eco-system has with testing DWARF
emission. We currently have several approaches, neither of which is
great:

1. llvm-dwarfdump: the best approach when it works. But unfortunately
lib/DebugInfo supports only a (small) subset of DWARF. Tricky sections
like debug_frame aren't supported.
2. Relying of assembly directive emissions (i.e. .cfi_*), which is
cumbersome and misses a lot of things like actual DWARF encoding.
3. Using elf-dump and examining the raw binary dumps. This makes tests
nearly unmaintainable.

The latter is also why IMHO our ELF emission in general isn't well
tested. elf-dump is just too rudimentary and relies on simple (=dumb)
binary contents dumps.

The long-term solution for DWARF would be to enhance lib/DebugInfo to
the point where it can handle all interesting DWARF sections. But this
is a lofty goal, since DWARF parsing is notoriously hard and this
would require a large investment of time and effort. And in the
meantime, we just don't write good enough tests (and enough of them)
for this very important feature.

Therefore, as an interim stage, I propose to adopt some external tool
that parses DWARF and emits decoded textual dumps which makes tests
easy to write.

Concretely, I have a pure Python library named pyelftools
(https://bitbucket.org/eliben/pyelftools) which provides comprehensive
ELF and DWARF parsing capabilities and has a dumper that's fully
compatible with the readelf command. Using pyelftools would allow us
to immediately improve the quality of our tests, and as lib/DebugInfo
matures llvm-dwarfdump can gradually replace the dumper without
changing the actual tests.

pyelftools is relatively widely used so it's well tested, all it
requires is Python 2.6 and higher, and its code is in the public
domain. So it can live in tools/ or test/Scripts or wherever and be
distributed with LLVM. I actively maintain it and hacking it to LLVM's
purposes should be relatively easy. As a bonus, it has a much smarter
ELF parser & dumper that can replace the ad-hoc elf-dump. It has also
been successfully adapted in the past to read DWARF from MachO files,
if that's required.

Eli