[llvm-dev] unified debug information despite function/data sections flags

Thu Sep 30 11:04:52 PDT 2021

Yeah, grouping things would help, for sure. You might still want to treat
the abstract subprogram as removable (though it'd come at a cost - maybe a
tuning parameter depending on how much object size/linker relocation
handling you want to pay for smaller DWARF). Also DWARFv5 indexed strings
would probably make it harder to strip dead strings without a DWARF-aware
tool, since there wouldn't be direct relocations to the strings anymore
(well, they will be in the .debug_str_offsets section - so you'd have to
figure out how to prune /that/ to then prune the others, etc... which might
make it larger, because then each set of strings referenced by a chunk of a
CU would have to have its own string indexes/couldn't share them with
others, maybe? Or maybe could share them with even more relocations... )

On Thu, Sep 30, 2021 at 10:25 AM <paul.robinson at sony.com> wrote:

> Hmmm…
>
>
>
> The primary requirement, to make this work without gobs of relocations, is
> to minimize references that could “move” if a function is deleted, and of
> course references directly to a function itself.
>
>
>
> A reference could “move” in a case something like this:
>
>                function A
>
>                type T
>
>                function B(T)
>
> References are offsets from the base of the unit, so if function A is
> removed, the offset of type T will change, and so the reference from
> function B would have to be updated.  We can sidestep this if we guarantee
> that type T appears in the unit before any function that might be removed.
>
>
>
> Offhand there are two places where a function reference happens:
> references from concrete inlined subprograms to the abstract function, and
> the call-site stuff.  Hand-wave away the call-site stuff, and we’re left
> with the inlining stuff.  In this case I’d think a reasonable plan would be
> to treat the abstract function instance like a type, and put it before any
> concrete functions.
>
>
>
> Overall, then, we’d end up needing to split the DWARF into three parts.
>
> First, you have the unit header, top-level DIE, and all your type
> information that isn’t selectively removed by the linker (or already
> emitted separately as type units).  This part would also have the abstract
> instances of inlined functions.  This part is always emitted.  You need to
> arrange to have it end up being first in the post-linker output.
>
> Second, you have your per-function constructs.  These ought to be
> self-contained, except for references to types and abstract functions,
> which are all in the first part, so those references can remain constant
> offsets from the top of the compile unit.  Because these need to be
> self-contained, any namespace wrappers would need to be repeated per
> function.  And to get the dead-stripping done correctly, each DWARF
> fragment would be in the same COMDAT as the function’s .text section.
>
> Third, you need the final closing NULL (terminating the list of children
> of the compile-unit DIE) which also has a label so the final size of the
> compile unit can be computed correctly (this size lives in the compile-unit
> header).
>
>
>
> Currently in LLVM, DWARF gets emitted pretty much on-demand, meaning types
> and functions (concrete and abstract) can be intermixed willy-nilly.  It’s
> likely to require a real lot of effort to rework that into the
> types-versus-functions organization.
>
>
>
> This of course is talking only about the .debug_info section, and there
> are lots of other sections with per-function contributions.  Those are
> trickier, but also tend to be much smaller, so it might be reasonable to
> just hand-wave those away as not worth the extra effort.
>
>
>
> James might have other observations or recollections from doing the actual
> experiment.
>
> --paulr
>
>
>
> *From:* Youssefi, Anna <a-youssefi at ti.com>
> *Sent:* Thursday, September 30, 2021 10:50 AM
> *To:* Robinson, Paul <paul.robinson at sony.com>;
> jh7370.2008 at my.bristol.ac.uk; dblaikie at gmail.com
> *Cc:* llvm-dev at lists.llvm.org
> *Subject:* RE: [llvm-dev] unified debug information despite function/data
> sections flags
>
>
>
> We are emitting our own DWARF extensions because our object file editor
> and a utility script use these for generating a call graph with stack
> sizes.  We are not deriving stack sizes from DWARF but rather emitting a
> Vendor-specific attribute in the subprogram DIE with the MachineFrameInfo
> getStackSize() value, which appears to be the same value used for LLVM’s
> own stack size section.
>
>
>
> We are also using our own linker, rather than lld.  Our linker already
> removes unreferenced subsections, and in the case of our proprietary
> compiler, the dwarf information is already separated by function so it also
> gets removed if it pertains to an unreferenced function subsection.  So we
> are only having this problem with our LLVM-based front end because the
> debug information is combined.
>
>
>
> I can see Todd Snider just re-asked my question.  I believe this was
> already answered as being problematic due to hard-coded addresses and size
> overhead?
>
>
>
> Thanks,
>
> Anna
>
>
>
> *From:* paul.robinson at sony.com <paul.robinson at sony.com>
> *Sent:* Thursday, September 30, 2021 8:31 AM
> *To:* jh7370.2008 at my.bristol.ac.uk; dblaikie at gmail.com
> *Cc:* Youssefi, Anna <a-youssefi at ti.com>; llvm-dev at lists.llvm.org
> *Subject:* [EXTERNAL] RE: [llvm-dev] unified debug information despite
> function/data sections flags
>
>
>
> I agree with James about using `-fstack-size-section` to get static stack
> size information.  Deriving that info from DWARF seems like a lot of work;
> I imagine you’d have to parse all of the locations within a function,
> looking for frame offsets.  Even then the result would be incomplete
> because it would describe only the stack slots used by declared variables.
> Temporaries and even spill slots probably would not be accounted for.
>
>
>
> Regarding partitioning DWARF, just for completeness I’ll say that we did
> also (at least briefly) look at using DWARF partial-units, but the size
> overhead seemed like it would not be a net win.
>
> --paulr
>
>
>
> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> *On Behalf Of *James
> Henderson via llvm-dev
> *Sent:* Thursday, September 30, 2021 3:44 AM
> *To:* David Blaikie <dblaikie at gmail.com>
> *Cc:* llvm-dev at lists.llvm.org; Youssefi, Anna <a-youssefi at ti.com>
> *Subject:* Re: [llvm-dev] unified debug information despite function/data
> sections flags
>
>
>
> Yep, I took a look at this last year/early this year, but never really
> came up with a fully functioning prototype that was actually efficient
> enough, and have since switched teams, so haven't had the time to work on
> it further.
>
>
>
> You can see my lightning talk from last year on the topic here:
> https://www.youtube.com/watch?v=0y6TlfFhCsU
> <https://urldefense.com/v3/__https:/www.youtube.com/watch?v=0y6TlfFhCsU__;!!JmoZiZGBv3RvKRSx!tN8gFEUPCxDRSu56DvwynukFPsnIfjTun8qHS8i2OIBJTTXVldfiOutPoBwVBScCog$>,
> and a mailing thread where I discussed it further here:
> https://lists.llvm.org/pipermail/llvm-dev/2020-November/146469.html
> <https://urldefense.com/v3/__https:/lists.llvm.org/pipermail/llvm-dev/2020-November/146469.html__;!!JmoZiZGBv3RvKRSx!tN8gFEUPCxDRSu56DvwynukFPsnIfjTun8qHS8i2OIBJTTXVldfiOutPoByLU9AFKw$>.
> The main issue I ran into was the number of hard-coded relative references
> within DWARF. Every single one of these needs to be updated at link time,
> if any of the data is dropped, or the DWARF will end up invalid. To do
> this, I had to add relocations to the DWARF which patched the relevant
> fields at link time, based on the final computed offset, but this had a
> serious performance cost in the linker (not to mention any potential cost
> in the assembler). This approach is certainly possible for the most part,
> at least for .debug_line and .debug_info (it's not necessarily clear
> whether it can be done with some of the other DWARF sections, although the
> benefits in most of them aren't particularly clear), but the difficulty is
> getting it to be fast.
>
>
>
> I'd be happy to discuss this further, and provide any feedback on other
> ideas, if you have any, but currently have no plans to continue this work
> at this time myself.
>
>
>
> By the way, if you are using the DWARF for stack usage analysis, have you
> considered the .stack_sizes section? This emits a section that contains the
> stack size of every function in the output, and can be dumped using
> llvm-readobj. It is split up so that the linker can strip bits that
> reference dead data, so you should only end up with the actually useful
> information in the output.
>
>
>
> James
>
>
>
>
>
> On Thu, 30 Sept 2021 at 07:51, David Blaikie <dblaikie at gmail.com> wrote:
>
> You can differentiate dead function descriptions from others on most
> platforms by checking if the low_pc == 0. If 0 is a valid instruction
> address on your architecture, you can use an lld feature to set a more
> authoritative/unambiguous tombstone value for dead code addresses, passing
> something like:
>
> * -z 'dead-reloc-in-nonalloc=.debug_ranges=0xfffffffffffffffe'*
>
> * -z 'dead-reloc-in-nonalloc=.debug_loc=0xfffffffffffffffe'*
>
> * -z 'dead-reloc-in-nonalloc=.debug_*=0xffffffffffffffff'*
>
> to the linker.
>
> As for reducing debug info size by omitting debug info descriptions of
> dead code - Apple/MachO's dsymutil does this, and I believe Alexey Lapshin
> is working on trying to get similar behavior into lld, possibly (or as a
> post-link tool).
>
> There's also the possibility of using comdats to make the linker's job
> easier - I think there might be ways to structure the DWARF into chunks
> that could be deduplicated and dropped naturally by a linker's existing
> comdat support, but I haven't fully prototyped it. I think there was a
> thread a while back with JHenderson and others discussing this possibility
> further.
>
> - Dave
>
> On Wed, Sep 29, 2021 at 12:50 PM Youssefi, Anna via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hi,
>
>
>
> I was wondering if there are any plans to separate debug information into
> distinct sections accordingly when the compiler flags -ffunction-sections
> and/or -fdata-sections are used.  If an unreferenced function is removed
> from the link, it makes no sense for its associated debug information to
> still be included.  As we rely on the debug information for stack usage
> analysis, we wind up displaying stack usage statistics for unreferenced
> functions that were eliminated from the link if debug information for any
> other referenced functions is in the same debug section.  It seems that
> others have run into this problem previously so I wanted to check whether
> there are any plans to change the behavior.
>
>
>
> Thanks,
>
> Anna Youssefi
>
> Texas Instruments, Codegen group
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> <https://urldefense.com/v3/__https:/lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev__;!!JmoZiZGBv3RvKRSx!tN8gFEUPCxDRSu56DvwynukFPsnIfjTun8qHS8i2OIBJTTXVldfiOutPoBwG7e4e1Q$>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210930/4eb5d12c/attachment.html>