<div dir="ltr">If you only want code addresses, why not use the CU's low_pc/high_pc/ranges - those are guaranteed to be only code addresses, I think?<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Mar 10, 2020 at 8:18 AM Luke Drummond via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hello<br>
<br>
I've been looking at a debuginfo issue on an out-of-tree target which uses<br>
DWARF aranges.<br>
<br>
The problem is that aranges are generated for both data and code objects, and<br>
the debugger gets confused when program addresses overlap data addresses. The<br>
target is a Harvard Architecture CPU, so the appearance of overlapping address<br>
ranges is not in itself a bug as they reside in different address spaces.<br>
<br>
During my investigations, I found that:<br>
<br>
- gcc appears to never generate an entry in the `.debug_aranges` table for<br>
data objects. I did a cursory read over gcc's source and history and it is<br>
my understanding that aranges are deliberately only emitted for text and<br>
cold text sections[1].<br>
- However, the DWARF v5 specification[2] for `.debug_aranges` does not suggest<br>
that aranges should only be for text address and the wording<br>
strongly suggests that their use is general:<br>
<br>
6.1.2:<br>
> This header is followed by a variable number of address range descriptors.<br>
> Each descriptor is a triple consisting of a segment selector, the<br>
> beginning address within that segment of a range of text or data covered<br>
> by some entry owned by the corresponding compilation unit, followed by the<br>
> non-zero length of that range<br>
<br>
As such llvm is doing nothing generally wrong by emitting aranges for data<br>
objects.<br>
<br>
- llvm unconditionally sets the `.debug_aranges.segment_selector_size` to<br>
zero[3]. GCC does this too. I think this is a bug if the target can have<br>
overlapping ranges due to multiple code/data address spaces as in my case<br>
of a Harvard machine.<br>
<br>
As far as I can tell, the only upstream backend that is of a similar<br>
configuration is AVR. I can reproduce the same `.debug_aranges` table as my<br>
target with the following simple example:<br>
<br>
$ clang -target avr -mmcu=attiny104 -S -o - -g -gdwarf-aranges -xc - <<'EOF'<br>
char char_array[16383] = {0};<br>
int main() {<br>
return char_array[0];<br>
}<br>
EOF<br>
# ...<br>
.section .debug_aranges,"",@progbits<br>
.long 20 ; Length of ARange Set<br>
.short 2 ; DWARF Arange version number<br>
.long .Lcu_begin0 ; Offset Into Debug Info Section<br>
.byte 2 ; Address Size (in bytes)<br>
.byte 0 ; Segment Size (in bytes)<br>
.short my_array<br>
.short .Lsec_end0-my_array<br>
.short .Lfunc_begin0<br>
.short .Lsec_end1-.Lfunc_begin0<br>
.short 0 ; ARange terminator<br>
<br>
...but I cannot see documentation anywhere on what a consumer is expected to do<br>
with such information, and how *in general* multiple address spaces are expected<br>
to work for llvm and gcc when generating DWARF aranges when there is no segment<br>
selector in the tuple.<br>
<br>
A cursory grep of lldb shows that the segment size is set from the<br>
`.debug_aranges` header, but never checked. If it *is* nonzero, lldb will silently<br>
read incorrect data and possibly crash. I have provided a patch on the lldb<br>
mailing list[5]. My patch brings lldb in-line with gdb which throws an error in<br>
case of a nonzero segment selector size[6].<br>
<br>
My question is: Should LLVM have some logic to emit `segment_selector_size != 0`<br>
for targets without a flat address space? Alternative formation: do we need to<br>
limit the emission of arange info for only code objects 1) only in non-flat<br>
address-space case or 2) for all targets unconditionally?<br>
<br>
My intuition is that we should limit emission of aranges to objects in the main<br>
text section. Neither GDB nor LLDB handle aranges for targets without flat<br>
address spaces, and significant work might be needed in downstream DWARF<br>
consumers. The usefulness of address ranges for data objects is not<br>
something obvious to me as the uses of this section in DWARF consumers<br>
seeems to mostly be PC-lookup.<br>
<br>
Any insight would be appreciated. I can likely provide patches if we conclude<br>
that changes are needed in LLVM.<br>
<br>
All the Best<br>
<br>
Luke<br>
<br>
[1] GCC only emits aranges for text:<br>
<a href="https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=blob;f=gcc/dwarf2out.c;h=bb45279ea56d36621f14b0a68f4f0f0be3bf4e97;hb=HEAD#l11637" rel="noreferrer" target="_blank">https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=blob;f=gcc/dwarf2out.c;h=bb45279ea56d36621f14b0a68f4f0f0be3bf4e97;hb=HEAD#l11637</a><br>
[2] DWARF Debugging Information Format Version 5; 6.1. <a href="http://dwarfstd.org/Dwarf5Std.php" rel="noreferrer" target="_blank">http://dwarfstd.org/Dwarf5Std.php</a><br>
[3] LLVM segment selector size is always zero: <a href="https://github.com/llvm/llvm-project/blob/e71fb46a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L2749" rel="noreferrer" target="_blank">https://github.com/llvm/llvm-project/blob/e71fb46a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L2749</a><br>
[4] GCC segment selector size is always zero:<br>
<a href="https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=blob;f=gcc/dwarf2out.c;h=bb45279ea56d36621f14b0a68f4f0f0be3bf4e97;hb=HEAD#l11624" rel="noreferrer" target="_blank">https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=blob;f=gcc/dwarf2out.c;h=bb45279ea56d36621f14b0a68f4f0f0be3bf4e97;hb=HEAD#l11624</a><br>
[5] lldb patch to gracefully error on nonzero segment selector size: <a href="https://reviews.llvm.org/D75925" rel="noreferrer" target="_blank">https://reviews.llvm.org/D75925</a><br>
[6] GDB implementation of [5]:<br>
<a href="https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=gdb/dwarf2/read.c;h=1d4397dfabc72004eaa64013e47033e0ebdfe213;hb=HEAD#l2779" rel="noreferrer" target="_blank">https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=gdb/dwarf2/read.c;h=1d4397dfabc72004eaa64013e47033e0ebdfe213;hb=HEAD#l2779</a><br>
<br>
-- <br>
Codeplay Software Ltd.<br>
Company registered in England and Wales, number: 04567874<br>
Registered office: Regent House, 316 Beulah Hill, London, SE19 3HF<br>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
</blockquote></div>