[llvm-dev] DWARF .debug_aranges data objects and address spaces

Wed Mar 18 11:53:19 PDT 2020

(hmm, unfortunately this broke gmail (& maybe other?) threading in some way
:/)

On Wed, Mar 18, 2020 at 8:49 AM Luke Drummond <luke.drummond at codeplay.com>
wrote:

> Hi David, Paul
>
> On Thu Mar 12, 2020 at 11:22 PM PST, David Blaikie wrote:
> > On Thu, Mar 12, 2020 at 1:51 PM Robinson, Paul <paul.robinson at sony.com>
> > wrote:
> >
> > > I’ve encountered this kind of architecture before, a long time ago
> > > (academically).    In a flat-address-space machine such as X64, there
> is
> > > still an instruction/data distinction, but usually only down at the
> level
> > > of I-cache versus D-cache (instruction fetch versus data fetch).  A
> Harvard
> > > architecture machine exposes that to the programmer, which effectively
> > > doubles the available address space.  Code and data live in different
> > > address spaces, although the address space identifier per se is not
> > > explicit.  A Move instruction would implicitly use the data address
> space,
> > > while an indirect Branch would implicitly target the code address
> space.
> > > An OS running on a Harvard architecture would require the loader to be
> > > privileged, so it can map data from an object file into the code
> address
> > > space and implement any necessary fixups.  Self-modifying code is at
> least
> > > wicked hard if not impossible to achieve.
>
> Paul: On our target it's impossible to have self-modifying code. The
> loader is
> the ROM flasher and the target jumps to a fixed address on core reset.
> There are no special addressing modes that would allow a code section to
> be overwritten.
> Everything you say above is consistent with my experience on this
> target.
> > >
> > >
> > >
> > > In DWARF this would indeed be described by a segment selector.  It’s
> up to
> > > the target ABI to specify what the segment selector numbers actually
> are.
> > > For a Harvard architecture machine this is pretty trivial, you say
> > > something like 0 for code and 1 for data.  Boom done.
> > >
> Paul: I think I need to have a discussion with the people implementing
> the debugger and work out the ABI a bit more formally.
> > >
> > >
> > > LLVM basically doesn’t have targets like this, or at least it has never
> > > come up before that I’m aware of.  So, when we emit DWARF, we assume a
> flat
> > > address space (unconditionally setting the segment selector size to
> zero),
> > > and llvm-dwarfdump will choke (hopefully cleanly, but still) on an
> object
> > > file that uses DWARF segment selectors.
> > >
> >
> > FWIW Luke mentioned in the original email the AVR in-tree backend seems
> > to
> > have this problem with an ambiguous debug_aranges entries.
>
> David: I'm not sure what's expected for AVR, but it's the only target I
> can see that looks like a Harvard machine. The default seems to be to
> not emit arange information so I think if this is a bug, it's only in
> very specific circumstances.
> >
> >
> > >  The point of .debug_aranges is to accelerate the search for the
> > > appropriate CU.  Yes you can spend time trolling through .debug_info
> and
> > > .debug_abbrev, decoding the CU DIEs looking for low_pc/high_pc pairs
> (or
> > > perhaps pointers to .debug_ranges) and effectively rebuild a
> .debug_aranges
> > > section yourself, if the compiler/linker isn’t kind enough to
> pre-build the
> > > table for you.  I don’t understand why .debug_aranges should be
> > > discouraged; I shouldn’t think they would be huge, and consumers can
> avoid
> > > loading lots of data just to figure out what’s not worth looking at.
> > > Forcing all consumers to do things the slow way seems unnecessarily
> > > inefficient.
> > >
> > > If the producer has put ranges on the CU it's not a lot of work - it's
>
> In my mind keeping debug_aranges is definitely a win for my usecases,
> and as long as the linker is doing the right thing is a useful
> optimization.
>

I'd be curious to know how much of an advantage it is, compared to using CU
ranges.

> > parsing one DIE & looking for a couple of attributes. With Split DWARF
> > the
> > cost of becomes a bit more prominent - Sema.o from clang, with split
> > dwarf
> > (v4 or v5 about the same) is about 3.5% larger with debug aranges (not
> > sure
> > about the overall data). It's enough at least at Google for us to not
> > use
> > them & use CU ranges for the same purpose.
>
>
> >
> > I thought I might be able to find some email history about why we turned
> > it
> > off by default, but seems we never turned it /on/ by default to begin
> > with
> > & it wasn't implemented until relatively late in the game (well, what I
> > think as relatively late - after I started on the project at least).
> >
> I'm generally of the opinion that if you're hulking around *any*
> debuginfo, you should include as much useful context as possible - and
> because of that any exra info the debugger can use to shorten the parse
> time is helpful. This is of course assuming I never have to debug
> binaries that are on production machines (a clear distinction between
> debug and release builds); reducing size makes sense
> there.
>

Even just for link times - reading/writing more bytes does slow things down
- so some data about how much the extra data improves debugger load
times/costs link time/etc would be interesting to me at least.

> >
> > >  Thinking about Harvard architecture specifically, you **need** the
> > > segment selector only when an address could be ambiguous about whether
> it’s
> > > a code or data address.  This basically comes up **only** in
> > > .debug_aranges, he said thinking about it for about 30 seconds.  Within
> > > .debug_info you don’t need it because when you pick up the address of
> an
> > > entity, you know whether it’s for a code or data entity.  Location
> lists
> > > and range lists always point to code.  For .debug_aranges you would
> need
> > > the segment selector, but I think that’s the only place.
> > >
> Agreed.
> > >
> > >
> > > For an architecture with multiple code or data segments, then you’d
> need
> > > the segment selector more widely, but I should think this case
> wouldn’t be
> > > all that difficult to make work.  Even factoring in the llvm-dwarfdump
> > > part, it has to understand the selector only for the .debug_aranges
> > > section; everything else can remain as it is, pretending there’s a flat
> > > address space.
> > >
> > >
> > >
> > > Now, if your target is downstream, that would make upstreaming the LLVM
> > > support a bit dicier, because we’d not want to have that feature in the
> > > upstream repo if there are no targets using it.  You’d be left
> maintaining
> > > that patch on your own.  But as I described above, I don’t think it
> would
> > > be a huge deal.
> > >
> I think I agree, and for the moment maintaining a patch downstream is
> perfectly fine. If and when LLVM gets an upstream target that needs to
> use multiple segments, we can probably contribute that support.
> > >
> > >
> > > HTH,
>
> This discussion has helped me greatly. Thanks all for your advice.
> > >
> > > --paulr
> > >
> > >
> > > Luke
> > >
> > > --
> > > Codeplay Software Ltd.
> > > Company registered in England and Wales, number: 04567874
> > > Registered office: Regent House, 316 Beulah Hill, London, SE19 3HF
> > >
> > >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200318/9fd61c2f/attachment.html>