[llvm-dev] DWARF v5 - compiler work update

via llvm-dev llvm-dev at lists.llvm.org
Wed Jun 13 14:01:36 PDT 2018


At EuroLLVM's Debug Info BoF, I made a rash statement about being done
with DWARF v5 "next week." It has been an impressively long week...
and it's not over yet.  But given Pavel Labath's awesome news about
accelerator tables, it seemed like a good moment to review the rest of
the DWARF v5 work.

This really falls into three broad categories:
- the bare minimum needed to be conformant with v5;
- making some optional features conformant with v5;
- other things that are pretty useful or beneficial;
- everything else.

Because of unexpected complexities and other factors, the DWARF v5
work has generally been taking longer than expected. I am still
hopeful to get to the "minimally conformant" stage before we branch
LLVM 7.0 on the first of August.  Some other useful/beneficial things
have already gone in, and that's also really positive.  But there is
quite a bit left to go.

I apologize in advance for not crediting people who did the work,
or (just as important) all the reviewing!  All efforts are greatly 
appreciated!


Minimally Conformant Features
=============================
By this I mean: It is possible to say `-gdwarf-5` and what comes out
will conform to the DWARF v5 specification.  It does not mean that
all optional debug-info features will be conforming, but you can get
a set of .debug_* sections that all claim to be (and actually are)
compliant with DWARF v5.  This work is primarily about data layout
and in some cases, section names; if you ignore the versioning, 
it's all pretty much NFC.

1. Compile Unit headers.  DONE.
   Compile units have v5 header formats.  Nearly all the content
   of the debugging information entries (DIEs) within a compile
   unit remain unchanged, it's just the header that got tweaked.

2. Line Table header.  DONE.
   As with compile units, the header layout follows v5.

3. Range lists.  IN PROGRESS.
   DWARF v5 changes the format and section name (.debug_rnglists).

4. Location lists.  NOT STARTED
   DWARF v5 changes the format and section name (.debug_loclists).


Optional Features Conformant
============================
This collects a few things that Clang/LLVM know how to do now,
but which need some fiddling to make them work as DWARF v5 wants.

1. Type units in .debug_info section. NOT STARTED.
   DWARF v4 defined type units, and gave them their own section
   (.debug_types).  In DWARF v5, type units go in .debug_info
   alongside the regular compile units.
   ** This is "not started" although I have made a run at it.
   Emitting type units in .debug_info is a snap, but getting the
   DWARF parser to understand what's going on requires a moderate
   amount of refactoring.  Shelved until the minimally conformant
   stuff is done.

2. Split DWARF.  NOT STARTED.
   Puts most debug info into a separate "DWARF object" file, to
   reduce the size of the raw data that the linker has to copy around.
   This was not actually part of DWARF v4, although both gcc and
   Clang support it.  Needs some fiddling for DWARF v5, although
   possibly no more than putting type units into .debug_info.

3. String Offsets table.  DONE.
   GNU had a prototype version of this, which Clang supported.
   The new .debug_str_offsets section is an array of offsets into
   the .debug_str section; this reduces the number of relocations
   for strings down from one per reference to one per unique string.


Other Useful or Beneficial Features
===================================
These help the linker (fewer relocations) or debugger (new info)
or have a goal of reducing debug-info size.

1. New "accelerator" tables. DONE.
   This is an index to speed up how the debugger finds info.

2. Line-table strings.  DONE.
   This moves the pathnames for source files out of the line-table
   header and into a separate string section, which allows the
   linker to deduplicate strings (which it already knows how to do)
   across compilation units, and so reduce the size required for line 
   tables.

3. MD5 Checksum for source files. DONE.
   Use MD5 instead of file size/modtime to allow the debugger to
   detect whether a file had changed since compilation.  Clang
   never provided the size/modtime so this is a real step forward.

4. Address Table.  NOT STARTED.
   There's a new .debug_addr section with an array for relocatable
   addresses, and a variety of ways DWARF can use elements of this
   array to avoid redundant relocations.
   ** This feature can be leveraged by the main debug info section,
   location lists, and range lists.


Everything Else
===============
AFAIK only a few small bits of other DWARF v5 features have been
implemented, and as people have interest, there's quite a lot of
"filling in the corners" that could be done.  Some of these are
intended to be helpful for debugging optimized code.

1. Default location entry.
   Instead of sequentially listing all the places a variable might
   reside over its lifetime, specify a default location (e.g., its
   stack slot) and then fill in other places.  The intent is to
   reduce the size of location lists.

2. Call-site info
   For optimized debugging, describes a function call site to help
   the debugger reconstruct the environment of a caller.

3. Entry value
   For optimized debugging, provides an expression to reconstruct
   the value of a parameter at the time the function was called.

4. Implicit pointer
   Even if a pointer has been optimized away, if the object still
   exists, an expression can describe it.  Especially useful when a
   small object with reference type has been moved into a register.

... and much much more, generally very little things that would be
pretty easy.

Thanks,
--paulr



More information about the llvm-dev mailing list