<div dir="ltr">Paul more-or-less summarised everything that I can think of. In the prototype I did, I actually ran a post-compile script to convert the compiled object to one with split-up debug info, and to add the necessary relocations to patch e.g. the unit length fields. I actually just naively split the debug info section at specific points, either side of every concrete (i.e. actually with its own text section) function and variable tag. This meant there was a sequence of "common" bits followed by 1 or more functions/variables, followed by more common stuff, followed by more functions/variables etc. This didn't produce optimal DWARF (e.g. it left empty namespace tags potentially, if all contents were stripped), but it was semantically correct (aside from the fact that I forgot to patch some of the internal references). Doing it this way might be easier to implement in the compiler, but I haven't attempted it yet. The slides from the talk give a high-level overview of this approach, so if someone else wants to pick up this work, go for it!<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 30 Sept 2021 at 18:25, <<a href="mailto:paul.robinson@sony.com">paul.robinson@sony.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">


<div style="overflow-wrap: break-word;" lang="EN-US">

<div class="gmail-m_8251035233951696114WordSection1">

<p class="MsoNormal">Hmmm…<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">The primary requirement, to make this work without gobs of relocations, is to minimize references that could “move” if a function is deleted, and of course references directly to a function itself. 

<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">A reference could “move” in a case something like this:<u></u><u></u></p>

<p class="MsoNormal">               function A<u></u><u></u></p>

<p class="MsoNormal">               type T<u></u><u></u></p>

<p class="MsoNormal">               function B(T)<u></u><u></u></p>

<p class="MsoNormal">References are offsets from the base of the unit, so if function A is removed, the offset of type T will change, and so the reference from function B would have to be updated.  We can sidestep this if we guarantee that type T appears in

 the unit before any function that might be removed.<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">Offhand there are two places where a function reference happens: references from concrete inlined subprograms to the abstract function, and the call-site stuff.  Hand-wave away the call-site stuff, and we’re left with the inlining stuff. 

 In this case I’d think a reasonable plan would be to treat the abstract function instance like a type, and put it before any concrete functions.<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">Overall, then, we’d end up needing to split the DWARF into three parts.<u></u><u></u></p>

<p class="MsoNormal">First, you have the unit header, top-level DIE, and all your type information that isn’t selectively removed by the linker (or already emitted separately as type units).  This part would also have the abstract instances of inlined functions. 

 This part is always emitted.  You need to arrange to have it end up being first in the post-linker output.<u></u><u></u></p>

<p class="MsoNormal">Second, you have your per-function constructs.  These ought to be self-contained, except for references to types and abstract functions, which are all in the first part, so those references can remain constant offsets from the top of the

 compile unit.  Because these need to be self-contained, any namespace wrappers would need to be repeated per function.  And to get the dead-stripping done correctly, each DWARF fragment would be in the same COMDAT as the function’s .text section.<u></u><u></u></p>

<p class="MsoNormal">Third, you need the final closing NULL (terminating the list of children of the compile-unit DIE) which also has a label so the final size of the compile unit can be computed correctly (this size lives in the compile-unit header).<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">Currently in LLVM, DWARF gets emitted pretty much on-demand, meaning types and functions (concrete and abstract) can be intermixed willy-nilly.  It’s likely to require a real lot of effort to rework that into the types-versus-functions

 organization.<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">This of course is talking only about the .debug_info section, and there are lots of other sections with per-function contributions.  Those are trickier, but also tend to be much smaller, so it might be reasonable to just hand-wave those

 away as not worth the extra effort.<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">James might have other observations or recollections from doing the actual experiment.<u></u><u></u></p>

<p class="MsoNormal">--paulr<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<div style="border-color:currentcolor currentcolor currentcolor blue;border-style:none none none solid;border-width:medium medium medium 1.5pt;padding:0in 0in 0in 4pt">

<div>

<div style="border-color:rgb(225,225,225) currentcolor currentcolor;border-style:solid none none;border-width:1pt medium medium;padding:3pt 0in 0in">

<p class="MsoNormal"><b>From:</b> Youssefi, Anna <<a href="mailto:a-youssefi@ti.com" target="_blank">a-youssefi@ti.com</a>> <br>

<b>Sent:</b> Thursday, September 30, 2021 10:50 AM<br>

<b>To:</b> Robinson, Paul <<a href="mailto:paul.robinson@sony.com" target="_blank">paul.robinson@sony.com</a>>; <a href="mailto:jh7370.2008@my.bristol.ac.uk" target="_blank">jh7370.2008@my.bristol.ac.uk</a>; <a href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a><br>

<b>Cc:</b> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

<b>Subject:</b> RE: [llvm-dev] unified debug information despite function/data sections flags<u></u><u></u></p>

</div>

</div>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">We are emitting our own DWARF extensions because our object file editor and a utility script use these for generating a call graph with stack sizes.  We are not deriving stack sizes from DWARF but rather emitting a Vendor-specific attribute

 in the subprogram DIE with the MachineFrameInfo getStackSize() value, which appears to be the same value used for LLVM’s own stack size section.<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">We are also using our own linker, rather than lld.  Our linker already removes unreferenced subsections, and in the case of our proprietary compiler, the dwarf information is already separated by function so it also gets removed if it pertains

 to an unreferenced function subsection.  So we are only having this problem with our LLVM-based front end because the debug information is combined.<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">I can see Todd Snider just re-asked my question.  I believe this was already answered as being problematic due to hard-coded addresses and size overhead?<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">Thanks,<u></u><u></u></p>

<p class="MsoNormal">Anna<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<div>

<div style="border-color:rgb(225,225,225) currentcolor currentcolor;border-style:solid none none;border-width:1pt medium medium;padding:3pt 0in 0in">

<p class="MsoNormal"><b>From:</b> <a href="mailto:paul.robinson@sony.com" target="_blank">paul.robinson@sony.com</a> <<a href="mailto:paul.robinson@sony.com" target="_blank">paul.robinson@sony.com</a>>

<br>

<b>Sent:</b> Thursday, September 30, 2021 8:31 AM<br>

<b>To:</b> <a href="mailto:jh7370.2008@my.bristol.ac.uk" target="_blank">jh7370.2008@my.bristol.ac.uk</a>;

<a href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a><br>

<b>Cc:</b> Youssefi, Anna <<a href="mailto:a-youssefi@ti.com" target="_blank">a-youssefi@ti.com</a>>;

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

<b>Subject:</b> [EXTERNAL] RE: [llvm-dev] unified debug information despite function/data sections flags<u></u><u></u></p>

</div>

</div>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">I agree with James about using `-fstack-size-section` to get static stack size information.  Deriving that info from DWARF seems like a lot of work; I imagine you’d have to parse all of the locations within a function, looking for frame

 offsets.  Even then the result would be incomplete because it would describe only the stack slots used by declared variables.  Temporaries and even spill slots probably would not be accounted for.<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">Regarding partitioning DWARF, just for completeness I’ll say that we did also (at least briefly) look at using DWARF partial-units, but the size overhead seemed like it would not be a net win.<u></u><u></u></p>

<p class="MsoNormal">--paulr<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<div style="border-color:currentcolor currentcolor currentcolor blue;border-style:none none none solid;border-width:medium medium medium 1.5pt;padding:0in 0in 0in 4pt">

<div>

<div style="border-color:rgb(225,225,225) currentcolor currentcolor;border-style:solid none none;border-width:1pt medium medium;padding:3pt 0in 0in">

<p class="MsoNormal"><b>From:</b> llvm-dev <<a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank">llvm-dev-bounces@lists.llvm.org</a>>

<b>On Behalf Of </b>James Henderson via llvm-dev<br>

<b>Sent:</b> Thursday, September 30, 2021 3:44 AM<br>

<b>To:</b> David Blaikie <<a href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a>><br>

<b>Cc:</b> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>; Youssefi, Anna <<a href="mailto:a-youssefi@ti.com" target="_blank">a-youssefi@ti.com</a>><br>

<b>Subject:</b> Re: [llvm-dev] unified debug information despite function/data sections flags<u></u><u></u></p>

</div>

</div>

<p class="MsoNormal"><u></u> <u></u></p>

<div>

<div>

<p class="MsoNormal">Yep, I took a look at this last year/early this year, but never really came up with a fully functioning prototype that was actually efficient enough, and have since switched teams, so haven't had the time to work on it further.<u></u><u></u></p>

</div>

<div>

<p class="MsoNormal"><u></u> <u></u></p>

</div>

<div>

<p class="MsoNormal">You can see my lightning talk from last year on the topic here:

<a href="https://urldefense.com/v3/__https:/www.youtube.com/watch?v=0y6TlfFhCsU__;!!JmoZiZGBv3RvKRSx!tN8gFEUPCxDRSu56DvwynukFPsnIfjTun8qHS8i2OIBJTTXVldfiOutPoBwVBScCog$" target="_blank">

https://www.youtube.com/watch?v=0y6TlfFhCsU</a>, and a mailing thread where I discussed it further here:

<a href="https://urldefense.com/v3/__https:/lists.llvm.org/pipermail/llvm-dev/2020-November/146469.html__;!!JmoZiZGBv3RvKRSx!tN8gFEUPCxDRSu56DvwynukFPsnIfjTun8qHS8i2OIBJTTXVldfiOutPoByLU9AFKw$" target="_blank">

https://lists.llvm.org/pipermail/llvm-dev/2020-November/146469.html</a>. The main issue I ran into was the number of hard-coded relative references within DWARF. Every single one of these needs to be updated at link time, if any of the data is dropped, or the

 DWARF will end up invalid. To do this, I had to add relocations to the DWARF which patched the relevant fields at link time, based on the final computed offset, but this had a serious performance cost in the linker (not to mention any potential cost in the

 assembler). This approach is certainly possible for the most part, at least for .debug_line and .debug_info (it's not necessarily clear whether it can be done with some of the other DWARF sections, although the benefits in most of them aren't particularly

 clear), but the difficulty is getting it to be fast.<u></u><u></u></p>

</div>

<div>

<p class="MsoNormal"><u></u> <u></u></p>

</div>

<div>

<p class="MsoNormal">I'd be happy to discuss this further, and provide any feedback on other ideas, if you have any, but currently have no plans to continue this work at this time myself.<u></u><u></u></p>

</div>

<div>

<p class="MsoNormal"><u></u> <u></u></p>

</div>

<div>

<p class="MsoNormal">By the way, if you are using the DWARF for stack usage analysis, have you considered the .stack_sizes section? This emits a section that contains the stack size of every function in the output, and can be dumped using llvm-readobj. It is

 split up so that the linker can strip bits that reference dead data, so you should only end up with the actually useful information in the output.<u></u><u></u></p>

</div>

<div>

<p class="MsoNormal"><u></u> <u></u></p>

</div>

<div>

<p class="MsoNormal">James<u></u><u></u></p>

</div>

<div>

<p class="MsoNormal"><u></u> <u></u></p>

</div>

</div>

<p class="MsoNormal"><u></u> <u></u></p>

<div>

<div>

<p class="MsoNormal">On Thu, 30 Sept 2021 at 07:51, David Blaikie <<a href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a>> wrote:<u></u><u></u></p>

</div>

<blockquote style="border-color:currentcolor currentcolor currentcolor rgb(204,204,204);border-style:none none none solid;border-width:medium medium medium 1pt;padding:0in 0in 0in 6pt;margin:5pt 0in 5pt 4.8pt">

<div>

<div>

<p class="MsoNormal" style="margin-bottom:12pt">You can differentiate dead function descriptions from others on most platforms by checking if the low_pc == 0. If 0 is a valid instruction address on your architecture, you can use an lld feature to set a more

 authoritative/unambiguous tombstone value for dead code addresses, passing something like:<u></u><u></u></p>

<pre><i><span style="color:black"> -z 'dead-reloc-in-nonalloc=.debug_ranges=0xfffffffffffffffe'<u></u><u></u></span></i></pre>

<pre><i><span style="color:black"> -z 'dead-reloc-in-nonalloc=.debug_loc=0xfffffffffffffffe'<u></u><u></u></span></i></pre>

<pre><i><span style="color:black"> -z 'dead-reloc-in-nonalloc=.debug_*=0xffffffffffffffff'</span></i><span style="color:black"><u></u><u></u></span></pre>

</div>

<p class="MsoNormal" style="margin-bottom:12pt">to the linker.<br>

<br>

As for reducing debug info size by omitting debug info descriptions of dead code - Apple/MachO's dsymutil does this, and I believe Alexey Lapshin is working on trying to get similar behavior into lld, possibly (or as a post-link tool).<br>

<br>

There's also the possibility of using comdats to make the linker's job easier - I think there might be ways to structure the DWARF into chunks that could be deduplicated and dropped naturally by a linker's existing comdat support, but I haven't fully prototyped

 it. I think there was a thread a while back with JHenderson and others discussing this possibility further.<br>

<br>

- Dave<u></u><u></u></p>

<div>

<div>

<p class="MsoNormal">On Wed, Sep 29, 2021 at 12:50 PM Youssefi, Anna via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<u></u><u></u></p>

</div>

<blockquote style="border-color:currentcolor currentcolor currentcolor rgb(204,204,204);border-style:none none none solid;border-width:medium medium medium 1pt;padding:0in 0in 0in 6pt;margin:5pt 0in 5pt 4.8pt">

<div>

<div>

<p class="MsoNormal">Hi,<u></u><u></u></p>

<p class="MsoNormal"> <u></u><u></u></p>

<p class="MsoNormal">I was wondering if there are any plans to separate debug information into distinct sections accordingly when the compiler flags -ffunction-sections and/or -fdata-sections are used. 

 If an unreferenced function is removed from the link, it makes no sense for its associated debug information to still be included.  As we rely on the debug information for stack usage analysis, we wind up displaying stack usage statistics for unreferenced

 functions that were eliminated from the link if debug information for any other referenced functions is in the same debug section.  It seems that others have run into this problem previously so I wanted to check whether there are any plans to change the behavior.<u></u><u></u></p>

<p class="MsoNormal"> <u></u><u></u></p>

<p class="MsoNormal">Thanks,<u></u><u></u></p>

<p class="MsoNormal">Anna Youssefi<u></u><u></u></p>

<p class="MsoNormal">Texas Instruments, Codegen group<u></u><u></u></p>

<p class="MsoNormal"> <u></u><u></u></p>

<p class="MsoNormal"> <u></u><u></u></p>

</div>

</div>

<p class="MsoNormal">_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

<a href="https://urldefense.com/v3/__https:/lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev__;!!JmoZiZGBv3RvKRSx!tN8gFEUPCxDRSu56DvwynukFPsnIfjTun8qHS8i2OIBJTTXVldfiOutPoBwG7e4e1Q$" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><u></u><u></u></p>

</blockquote>

</div>

</div>

</blockquote>

</div>

</div>

</div>

</div>

</div>


</blockquote></div>