[llvm-dev] DWARF .debug_aranges data objects and address spaces

Luke Drummond via llvm-dev llvm-dev at lists.llvm.org
Tue Mar 10 08:17:57 PDT 2020


Hello

I've been looking at a debuginfo issue on an out-of-tree target which uses
DWARF aranges.

The problem is that aranges are generated for both data and code objects, and
the debugger gets confused when program addresses overlap data addresses. The
target is a Harvard Architecture CPU, so the appearance of overlapping address
ranges is not in itself a bug as they reside in different address spaces.

During my investigations, I found that:

    - gcc appears to never generate an entry in the `.debug_aranges` table for
      data objects. I did a cursory read over gcc's source and history and it is
      my understanding that aranges are deliberately only emitted for text and
      cold text sections[1].
    - However, the DWARF v5 specification[2] for `.debug_aranges` does not suggest
      that aranges should only be for text address and the wording
      strongly suggests that their use is general:

          6.1.2:
          > This header is followed by a variable number of address range descriptors.
          > Each descriptor is a triple consisting of a segment selector, the
          > beginning address within that segment of a range of text or data covered
          > by some entry owned by the corresponding compilation unit, followed by the
          > non-zero length of that range

      As such llvm is doing nothing generally wrong by emitting aranges for data
      objects.

    - llvm unconditionally sets the `.debug_aranges.segment_selector_size` to
      zero[3]. GCC does this too. I think this is a bug if the target can have
      overlapping ranges due to multiple code/data address spaces as in my case
      of a Harvard machine.

As far as I can tell, the only upstream backend that is of a similar
configuration is AVR. I can reproduce the same `.debug_aranges` table as my
target with the following simple example:

    $ clang -target avr -mmcu=attiny104 -S -o - -g -gdwarf-aranges -xc - <<'EOF'
    char char_array[16383] = {0};
    int main() {
      return char_array[0];
    }
    EOF
    # ...
    .section        .debug_aranges,"", at progbits
    .long   20                      ; Length of ARange Set
    .short  2                       ; DWARF Arange version number
    .long   .Lcu_begin0             ; Offset Into Debug Info Section
    .byte   2                       ; Address Size (in bytes)
    .byte   0                       ; Segment Size (in bytes)
    .short  my_array
    .short  .Lsec_end0-my_array
    .short  .Lfunc_begin0
    .short  .Lsec_end1-.Lfunc_begin0
    .short  0                       ; ARange terminator

...but I cannot see documentation anywhere on what a consumer is expected to do
with such information, and how *in general* multiple address spaces are expected
to work for llvm and gcc when generating DWARF aranges when there is no segment
selector in the tuple.

A cursory grep of lldb shows that the segment size is set from the
`.debug_aranges` header, but never checked. If it *is* nonzero, lldb will silently
read incorrect data and possibly crash. I have provided a patch on the lldb
mailing list[5]. My patch brings lldb in-line with gdb which throws an error in
case of a nonzero segment selector size[6].

My question is: Should LLVM have some logic to emit `segment_selector_size != 0`
for targets without a flat address space? Alternative formation: do we need to
limit the emission of arange info for only code objects 1) only in non-flat
address-space case or 2) for all targets unconditionally?

My intuition is that we should limit emission of aranges to objects in the main
text section. Neither GDB nor LLDB handle aranges for targets without flat
address spaces, and significant work might be needed in downstream DWARF
consumers. The usefulness of address ranges for data objects is not
something obvious to me as the uses of this section in DWARF consumers
seeems to mostly be PC-lookup.

Any insight would be appreciated. I can likely provide patches if we conclude
that changes are needed in LLVM.

All the Best

Luke

[1] GCC only emits aranges for text:
    https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=blob;f=gcc/dwarf2out.c;h=bb45279ea56d36621f14b0a68f4f0f0be3bf4e97;hb=HEAD#l11637
[2] DWARF Debugging Information Format Version 5; 6.1. http://dwarfstd.org/Dwarf5Std.php
[3] LLVM segment selector size is always zero: https://github.com/llvm/llvm-project/blob/e71fb46a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L2749
[4] GCC segment selector size is always zero:
    https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=blob;f=gcc/dwarf2out.c;h=bb45279ea56d36621f14b0a68f4f0f0be3bf4e97;hb=HEAD#l11624
[5] lldb patch to gracefully error on nonzero segment selector size: https://reviews.llvm.org/D75925
[6] GDB implementation of [5]:
    https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=gdb/dwarf2/read.c;h=1d4397dfabc72004eaa64013e47033e0ebdfe213;hb=HEAD#l2779

-- 
Codeplay Software Ltd.
Company registered in England and Wales, number: 04567874
Registered office: Regent House, 316 Beulah Hill, London, SE19 3HF


More information about the llvm-dev mailing list