<div dir="ltr"><div>(hmm, unfortunately this broke gmail (& maybe other?) threading in some way :/)</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Mar 18, 2020 at 8:49 AM Luke Drummond <<a href="mailto:luke.drummond@codeplay.com">luke.drummond@codeplay.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi David, Paul<br>

<br>

On Thu Mar 12, 2020 at 11:22 PM PST, David Blaikie wrote:<br>

> On Thu, Mar 12, 2020 at 1:51 PM Robinson, Paul <<a href="mailto:paul.robinson@sony.com" target="_blank">paul.robinson@sony.com</a>><br>

> wrote:<br>

><br>

> > I’ve encountered this kind of architecture before, a long time ago<br>

> > (academically).    In a flat-address-space machine such as X64, there is<br>

> > still an instruction/data distinction, but usually only down at the level<br>

> > of I-cache versus D-cache (instruction fetch versus data fetch).  A Harvard<br>

> > architecture machine exposes that to the programmer, which effectively<br>

> > doubles the available address space.  Code and data live in different<br>

> > address spaces, although the address space identifier per se is not<br>

> > explicit.  A Move instruction would implicitly use the data address space,<br>

> > while an indirect Branch would implicitly target the code address space.<br>

> > An OS running on a Harvard architecture would require the loader to be<br>

> > privileged, so it can map data from an object file into the code address<br>

> > space and implement any necessary fixups.  Self-modifying code is at least<br>

> > wicked hard if not impossible to achieve.<br>

<br>

Paul: On our target it's impossible to have self-modifying code. The loader is<br>

the ROM flasher and the target jumps to a fixed address on core reset.<br>

There are no special addressing modes that would allow a code section to<br>

be overwritten.<br>

Everything you say above is consistent with my experience on this<br>

target.<br>

> ><br>

> ><br>

> ><br>

> > In DWARF this would indeed be described by a segment selector.  It’s up to<br>

> > the target ABI to specify what the segment selector numbers actually are.<br>

> > For a Harvard architecture machine this is pretty trivial, you say<br>

> > something like 0 for code and 1 for data.  Boom done.<br>

> ><br>

Paul: I think I need to have a discussion with the people implementing<br>

the debugger and work out the ABI a bit more formally.<br>

> ><br>

> ><br>

> > LLVM basically doesn’t have targets like this, or at least it has never<br>

> > come up before that I’m aware of.  So, when we emit DWARF, we assume a flat<br>

> > address space (unconditionally setting the segment selector size to zero),<br>

> > and llvm-dwarfdump will choke (hopefully cleanly, but still) on an object<br>

> > file that uses DWARF segment selectors.<br>

> ><br>

><br>

> FWIW Luke mentioned in the original email the AVR in-tree backend seems<br>

> to<br>

> have this problem with an ambiguous debug_aranges entries.<br>

<br>

David: I'm not sure what's expected for AVR, but it's the only target I<br>

can see that looks like a Harvard machine. The default seems to be to<br>

not emit arange information so I think if this is a bug, it's only in<br>

very specific circumstances.<br>

><br>

><br>

> >  The point of .debug_aranges is to accelerate the search for the<br>

> > appropriate CU.  Yes you can spend time trolling through .debug_info and<br>

> > .debug_abbrev, decoding the CU DIEs looking for low_pc/high_pc pairs (or<br>

> > perhaps pointers to .debug_ranges) and effectively rebuild a .debug_aranges<br>

> > section yourself, if the compiler/linker isn’t kind enough to pre-build the<br>

> > table for you.  I don’t understand why .debug_aranges should be<br>

> > discouraged; I shouldn’t think they would be huge, and consumers can avoid<br>

> > loading lots of data just to figure out what’s not worth looking at.<br>

> > Forcing all consumers to do things the slow way seems unnecessarily<br>

> > inefficient.<br>

> ><br>

> > If the producer has put ranges on the CU it's not a lot of work - it's<br>

<br>

In my mind keeping debug_aranges is definitely a win for my usecases,<br>

and as long as the linker is doing the right thing is a useful<br>

optimization.<br></blockquote><div><br></div><div>I'd be curious to know how much of an advantage it is, compared to using CU ranges.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

> parsing one DIE & looking for a couple of attributes. With Split DWARF<br>

> the<br>

> cost of becomes a bit more prominent - Sema.o from clang, with split<br>

> dwarf<br>

> (v4 or v5 about the same) is about 3.5% larger with debug aranges (not<br>

> sure<br>

> about the overall data). It's enough at least at Google for us to not<br>

> use<br>

> them & use CU ranges for the same purpose.<br>

<br>

<br>

><br>

> I thought I might be able to find some email history about why we turned<br>

> it<br>

> off by default, but seems we never turned it /on/ by default to begin<br>

> with<br>

> & it wasn't implemented until relatively late in the game (well, what I<br>

> think as relatively late - after I started on the project at least).<br>

><br>

I'm generally of the opinion that if you're hulking around *any*<br>

debuginfo, you should include as much useful context as possible - and<br>

because of that any exra info the debugger can use to shorten the parse<br>

time is helpful. This is of course assuming I never have to debug<br>

binaries that are on production machines (a clear distinction between<br>

debug and release builds); reducing size makes sense<br>

there.<br></blockquote><div><br>Even just for link times - reading/writing more bytes does slow things down - so some data about how much the extra data improves debugger load times/costs link time/etc would be interesting to me at least.<br> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

><br>

> >  Thinking about Harvard architecture specifically, you **need** the<br>

> > segment selector only when an address could be ambiguous about whether it’s<br>

> > a code or data address.  This basically comes up **only** in<br>

> > .debug_aranges, he said thinking about it for about 30 seconds.  Within<br>

> > .debug_info you don’t need it because when you pick up the address of an<br>

> > entity, you know whether it’s for a code or data entity.  Location lists<br>

> > and range lists always point to code.  For .debug_aranges you would need<br>

> > the segment selector, but I think that’s the only place.<br>

> ><br>

Agreed.<br>

> ><br>

> ><br>

> > For an architecture with multiple code or data segments, then you’d need<br>

> > the segment selector more widely, but I should think this case wouldn’t be<br>

> > all that difficult to make work.  Even factoring in the llvm-dwarfdump<br>

> > part, it has to understand the selector only for the .debug_aranges<br>

> > section; everything else can remain as it is, pretending there’s a flat<br>

> > address space.<br>

> ><br>

> ><br>

> ><br>

> > Now, if your target is downstream, that would make upstreaming the LLVM<br>

> > support a bit dicier, because we’d not want to have that feature in the<br>

> > upstream repo if there are no targets using it.  You’d be left maintaining<br>

> > that patch on your own.  But as I described above, I don’t think it would<br>

> > be a huge deal.<br>

> ><br>

I think I agree, and for the moment maintaining a patch downstream is<br>

perfectly fine. If and when LLVM gets an upstream target that needs to<br>

use multiple segments, we can probably contribute that support.<br>

> ><br>

> ><br>

> > HTH,<br>

<br>

This discussion has helped me greatly. Thanks all for your advice.<br>

> ><br>

> > --paulr<br>

> ><br>

> ><br>

> > Luke<br>

> ><br>

> > --<br>

> > Codeplay Software Ltd.<br>

> > Company registered in England and Wales, number: 04567874<br>

> > Registered office: Regent House, 316 Beulah Hill, London, SE19 3HF<br>

> ><br>

> ><br>

<br>

</blockquote></div></div>