[llvm-dev] DWARF .debug_aranges data objects and address spaces

David Blaikie via llvm-dev llvm-dev at lists.llvm.org
Tue Mar 10 12:45:28 PDT 2020


If you only want code addresses, why not use the CU's low_pc/high_pc/ranges
- those are guaranteed to be only code addresses, I think?

On Tue, Mar 10, 2020 at 8:18 AM Luke Drummond via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hello
>
> I've been looking at a debuginfo issue on an out-of-tree target which uses
> DWARF aranges.
>
> The problem is that aranges are generated for both data and code objects,
> and
> the debugger gets confused when program addresses overlap data addresses.
> The
> target is a Harvard Architecture CPU, so the appearance of overlapping
> address
> ranges is not in itself a bug as they reside in different address spaces.
>
> During my investigations, I found that:
>
>     - gcc appears to never generate an entry in the `.debug_aranges` table
> for
>       data objects. I did a cursory read over gcc's source and history and
> it is
>       my understanding that aranges are deliberately only emitted for text
> and
>       cold text sections[1].
>     - However, the DWARF v5 specification[2] for `.debug_aranges` does not
> suggest
>       that aranges should only be for text address and the wording
>       strongly suggests that their use is general:
>
>           6.1.2:
>           > This header is followed by a variable number of address range
> descriptors.
>           > Each descriptor is a triple consisting of a segment selector,
> the
>           > beginning address within that segment of a range of text or
> data covered
>           > by some entry owned by the corresponding compilation unit,
> followed by the
>           > non-zero length of that range
>
>       As such llvm is doing nothing generally wrong by emitting aranges
> for data
>       objects.
>
>     - llvm unconditionally sets the `.debug_aranges.segment_selector_size`
> to
>       zero[3]. GCC does this too. I think this is a bug if the target can
> have
>       overlapping ranges due to multiple code/data address spaces as in my
> case
>       of a Harvard machine.
>
> As far as I can tell, the only upstream backend that is of a similar
> configuration is AVR. I can reproduce the same `.debug_aranges` table as my
> target with the following simple example:
>
>     $ clang -target avr -mmcu=attiny104 -S -o - -g -gdwarf-aranges -xc -
> <<'EOF'
>     char char_array[16383] = {0};
>     int main() {
>       return char_array[0];
>     }
>     EOF
>     # ...
>     .section        .debug_aranges,"", at progbits
>     .long   20                      ; Length of ARange Set
>     .short  2                       ; DWARF Arange version number
>     .long   .Lcu_begin0             ; Offset Into Debug Info Section
>     .byte   2                       ; Address Size (in bytes)
>     .byte   0                       ; Segment Size (in bytes)
>     .short  my_array
>     .short  .Lsec_end0-my_array
>     .short  .Lfunc_begin0
>     .short  .Lsec_end1-.Lfunc_begin0
>     .short  0                       ; ARange terminator
>
> ...but I cannot see documentation anywhere on what a consumer is expected
> to do
> with such information, and how *in general* multiple address spaces are
> expected
> to work for llvm and gcc when generating DWARF aranges when there is no
> segment
> selector in the tuple.
>
> A cursory grep of lldb shows that the segment size is set from the
> `.debug_aranges` header, but never checked. If it *is* nonzero, lldb will
> silently
> read incorrect data and possibly crash. I have provided a patch on the lldb
> mailing list[5]. My patch brings lldb in-line with gdb which throws an
> error in
> case of a nonzero segment selector size[6].
>
> My question is: Should LLVM have some logic to emit `segment_selector_size
> != 0`
> for targets without a flat address space? Alternative formation: do we
> need to
> limit the emission of arange info for only code objects 1) only in non-flat
> address-space case or 2) for all targets unconditionally?
>
> My intuition is that we should limit emission of aranges to objects in the
> main
> text section. Neither GDB nor LLDB handle aranges for targets without flat
> address spaces, and significant work might be needed in downstream DWARF
> consumers. The usefulness of address ranges for data objects is not
> something obvious to me as the uses of this section in DWARF consumers
> seeems to mostly be PC-lookup.
>
> Any insight would be appreciated. I can likely provide patches if we
> conclude
> that changes are needed in LLVM.
>
> All the Best
>
> Luke
>
> [1] GCC only emits aranges for text:
>
> https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=blob;f=gcc/dwarf2out.c;h=bb45279ea56d36621f14b0a68f4f0f0be3bf4e97;hb=HEAD#l11637
> [2] DWARF Debugging Information Format Version 5; 6.1.
> http://dwarfstd.org/Dwarf5Std.php
> [3] LLVM segment selector size is always zero:
> https://github.com/llvm/llvm-project/blob/e71fb46a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L2749
> [4] GCC segment selector size is always zero:
>
> https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=blob;f=gcc/dwarf2out.c;h=bb45279ea56d36621f14b0a68f4f0f0be3bf4e97;hb=HEAD#l11624
> [5] lldb patch to gracefully error on nonzero segment selector size:
> https://reviews.llvm.org/D75925
> [6] GDB implementation of [5]:
>
> https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=gdb/dwarf2/read.c;h=1d4397dfabc72004eaa64013e47033e0ebdfe213;hb=HEAD#l2779
>
> --
> Codeplay Software Ltd.
> Company registered in England and Wales, number: 04567874
> Registered office: Regent House, 316 Beulah Hill, London, SE19 3HF
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200310/4d2373f4/attachment.html>


More information about the llvm-dev mailing list