[llvm-dev] DWARF .debug_aranges data objects and address spaces
Luke Drummond via llvm-dev
llvm-dev at lists.llvm.org
Tue Mar 10 08:17:57 PDT 2020
Hello
I've been looking at a debuginfo issue on an out-of-tree target which uses
DWARF aranges.
The problem is that aranges are generated for both data and code objects, and
the debugger gets confused when program addresses overlap data addresses. The
target is a Harvard Architecture CPU, so the appearance of overlapping address
ranges is not in itself a bug as they reside in different address spaces.
During my investigations, I found that:
- gcc appears to never generate an entry in the `.debug_aranges` table for
data objects. I did a cursory read over gcc's source and history and it is
my understanding that aranges are deliberately only emitted for text and
cold text sections[1].
- However, the DWARF v5 specification[2] for `.debug_aranges` does not suggest
that aranges should only be for text address and the wording
strongly suggests that their use is general:
6.1.2:
> This header is followed by a variable number of address range descriptors.
> Each descriptor is a triple consisting of a segment selector, the
> beginning address within that segment of a range of text or data covered
> by some entry owned by the corresponding compilation unit, followed by the
> non-zero length of that range
As such llvm is doing nothing generally wrong by emitting aranges for data
objects.
- llvm unconditionally sets the `.debug_aranges.segment_selector_size` to
zero[3]. GCC does this too. I think this is a bug if the target can have
overlapping ranges due to multiple code/data address spaces as in my case
of a Harvard machine.
As far as I can tell, the only upstream backend that is of a similar
configuration is AVR. I can reproduce the same `.debug_aranges` table as my
target with the following simple example:
$ clang -target avr -mmcu=attiny104 -S -o - -g -gdwarf-aranges -xc - <<'EOF'
char char_array[16383] = {0};
int main() {
return char_array[0];
}
EOF
# ...
.section .debug_aranges,"", at progbits
.long 20 ; Length of ARange Set
.short 2 ; DWARF Arange version number
.long .Lcu_begin0 ; Offset Into Debug Info Section
.byte 2 ; Address Size (in bytes)
.byte 0 ; Segment Size (in bytes)
.short my_array
.short .Lsec_end0-my_array
.short .Lfunc_begin0
.short .Lsec_end1-.Lfunc_begin0
.short 0 ; ARange terminator
...but I cannot see documentation anywhere on what a consumer is expected to do
with such information, and how *in general* multiple address spaces are expected
to work for llvm and gcc when generating DWARF aranges when there is no segment
selector in the tuple.
A cursory grep of lldb shows that the segment size is set from the
`.debug_aranges` header, but never checked. If it *is* nonzero, lldb will silently
read incorrect data and possibly crash. I have provided a patch on the lldb
mailing list[5]. My patch brings lldb in-line with gdb which throws an error in
case of a nonzero segment selector size[6].
My question is: Should LLVM have some logic to emit `segment_selector_size != 0`
for targets without a flat address space? Alternative formation: do we need to
limit the emission of arange info for only code objects 1) only in non-flat
address-space case or 2) for all targets unconditionally?
My intuition is that we should limit emission of aranges to objects in the main
text section. Neither GDB nor LLDB handle aranges for targets without flat
address spaces, and significant work might be needed in downstream DWARF
consumers. The usefulness of address ranges for data objects is not
something obvious to me as the uses of this section in DWARF consumers
seeems to mostly be PC-lookup.
Any insight would be appreciated. I can likely provide patches if we conclude
that changes are needed in LLVM.
All the Best
Luke
[1] GCC only emits aranges for text:
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=blob;f=gcc/dwarf2out.c;h=bb45279ea56d36621f14b0a68f4f0f0be3bf4e97;hb=HEAD#l11637
[2] DWARF Debugging Information Format Version 5; 6.1. http://dwarfstd.org/Dwarf5Std.php
[3] LLVM segment selector size is always zero: https://github.com/llvm/llvm-project/blob/e71fb46a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L2749
[4] GCC segment selector size is always zero:
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=blob;f=gcc/dwarf2out.c;h=bb45279ea56d36621f14b0a68f4f0f0be3bf4e97;hb=HEAD#l11624
[5] lldb patch to gracefully error on nonzero segment selector size: https://reviews.llvm.org/D75925
[6] GDB implementation of [5]:
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=gdb/dwarf2/read.c;h=1d4397dfabc72004eaa64013e47033e0ebdfe213;hb=HEAD#l2779
--
Codeplay Software Ltd.
Company registered in England and Wales, number: 04567874
Registered office: Regent House, 316 Beulah Hill, London, SE19 3HF
More information about the llvm-dev
mailing list