[llvm-dev] DWARF .debug_aranges data objects and address spaces

Wed Mar 18 08:49:27 PDT 2020

Hi David, Paul

On Thu Mar 12, 2020 at 11:22 PM PST, David Blaikie wrote:
> On Thu, Mar 12, 2020 at 1:51 PM Robinson, Paul <paul.robinson at sony.com>
> wrote:
>
> > I’ve encountered this kind of architecture before, a long time ago
> > (academically).    In a flat-address-space machine such as X64, there is
> > still an instruction/data distinction, but usually only down at the level
> > of I-cache versus D-cache (instruction fetch versus data fetch).  A Harvard
> > architecture machine exposes that to the programmer, which effectively
> > doubles the available address space.  Code and data live in different
> > address spaces, although the address space identifier per se is not
> > explicit.  A Move instruction would implicitly use the data address space,
> > while an indirect Branch would implicitly target the code address space.
> > An OS running on a Harvard architecture would require the loader to be
> > privileged, so it can map data from an object file into the code address
> > space and implement any necessary fixups.  Self-modifying code is at least
> > wicked hard if not impossible to achieve.

Paul: On our target it's impossible to have self-modifying code. The loader is
the ROM flasher and the target jumps to a fixed address on core reset.
There are no special addressing modes that would allow a code section to
be overwritten.
Everything you say above is consistent with my experience on this
target.
> >
> >
> >
> > In DWARF this would indeed be described by a segment selector.  It’s up to
> > the target ABI to specify what the segment selector numbers actually are.
> > For a Harvard architecture machine this is pretty trivial, you say
> > something like 0 for code and 1 for data.  Boom done.
> >
Paul: I think I need to have a discussion with the people implementing
the debugger and work out the ABI a bit more formally.
> >
> >
> > LLVM basically doesn’t have targets like this, or at least it has never
> > come up before that I’m aware of.  So, when we emit DWARF, we assume a flat
> > address space (unconditionally setting the segment selector size to zero),
> > and llvm-dwarfdump will choke (hopefully cleanly, but still) on an object
> > file that uses DWARF segment selectors.
> >
>
> FWIW Luke mentioned in the original email the AVR in-tree backend seems
> to
> have this problem with an ambiguous debug_aranges entries.

David: I'm not sure what's expected for AVR, but it's the only target I
can see that looks like a Harvard machine. The default seems to be to
not emit arange information so I think if this is a bug, it's only in
very specific circumstances.
>
>
> >  The point of .debug_aranges is to accelerate the search for the
> > appropriate CU.  Yes you can spend time trolling through .debug_info and
> > .debug_abbrev, decoding the CU DIEs looking for low_pc/high_pc pairs (or
> > perhaps pointers to .debug_ranges) and effectively rebuild a .debug_aranges
> > section yourself, if the compiler/linker isn’t kind enough to pre-build the
> > table for you.  I don’t understand why .debug_aranges should be
> > discouraged; I shouldn’t think they would be huge, and consumers can avoid
> > loading lots of data just to figure out what’s not worth looking at.
> > Forcing all consumers to do things the slow way seems unnecessarily
> > inefficient.
> >
> > If the producer has put ranges on the CU it's not a lot of work - it's

In my mind keeping debug_aranges is definitely a win for my usecases,
and as long as the linker is doing the right thing is a useful
optimization.

> parsing one DIE & looking for a couple of attributes. With Split DWARF
> the
> cost of becomes a bit more prominent - Sema.o from clang, with split
> dwarf
> (v4 or v5 about the same) is about 3.5% larger with debug aranges (not
> sure
> about the overall data). It's enough at least at Google for us to not
> use
> them & use CU ranges for the same purpose.

>
> I thought I might be able to find some email history about why we turned
> it
> off by default, but seems we never turned it /on/ by default to begin
> with
> & it wasn't implemented until relatively late in the game (well, what I
> think as relatively late - after I started on the project at least).
>
I'm generally of the opinion that if you're hulking around *any*
debuginfo, you should include as much useful context as possible - and
because of that any exra info the debugger can use to shorten the parse
time is helpful. This is of course assuming I never have to debug
binaries that are on production machines (a clear distinction between
debug and release builds); reducing size makes sense
there.
>
> >  Thinking about Harvard architecture specifically, you **need** the
> > segment selector only when an address could be ambiguous about whether it’s
> > a code or data address.  This basically comes up **only** in
> > .debug_aranges, he said thinking about it for about 30 seconds.  Within
> > .debug_info you don’t need it because when you pick up the address of an
> > entity, you know whether it’s for a code or data entity.  Location lists
> > and range lists always point to code.  For .debug_aranges you would need
> > the segment selector, but I think that’s the only place.
> >
Agreed.
> >
> >
> > For an architecture with multiple code or data segments, then you’d need
> > the segment selector more widely, but I should think this case wouldn’t be
> > all that difficult to make work.  Even factoring in the llvm-dwarfdump
> > part, it has to understand the selector only for the .debug_aranges
> > section; everything else can remain as it is, pretending there’s a flat
> > address space.
> >
> >
> >
> > Now, if your target is downstream, that would make upstreaming the LLVM
> > support a bit dicier, because we’d not want to have that feature in the
> > upstream repo if there are no targets using it.  You’d be left maintaining
> > that patch on your own.  But as I described above, I don’t think it would
> > be a huge deal.
> >
I think I agree, and for the moment maintaining a patch downstream is
perfectly fine. If and when LLVM gets an upstream target that needs to
use multiple segments, we can probably contribute that support.
> >
> >
> > HTH,

This discussion has helped me greatly. Thanks all for your advice.
> >
> > --paulr
> >
> >
> > Luke
> >
> > --
> > Codeplay Software Ltd.
> > Company registered in England and Wales, number: 04567874
> > Registered office: Regent House, 316 Beulah Hill, London, SE19 3HF
> >
> >