[llvm-dev] Increasing address pool reuse/reducing .o file size in DWARFv5
David Blaikie via llvm-dev
llvm-dev at lists.llvm.org
Mon Jan 13 09:20:35 PST 2020
On Mon, Jan 13, 2020 at 9:03 AM Vedant Kumar <vedant_kumar at apple.com> wrote:
> I think I get it now, thanks for explaining!
>
> On Jan 12, 2020, at 11:44 AM, David Blaikie via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
>
>
> On Fri, Jan 10, 2020 at 12:57 PM Vedant Kumar <vedant_kumar at apple.com>
> wrote:
>
>> I don't totally follow the proposed encoding change & would appreciate a
>> small example.
>>
>> Is the idea to replace e.g. an 'AT_low_pc (<direct address>) + relocation
>> for <direct address>' with an 'AT_low_pc (<indirection into a pool of
>> addresses> + offset)',
>>
>
> With Split DWARF or with DWARFv5 in LLVM at the moment, all addresses are
> indirected already. So it's:
>
> Replace "AT_low_pc (<indirection into a pool of addresses>)" with an
> "AT_low_pc (<indirection into a pool of addresses> + offset)".
>
>
>> s.t. the cost of a relocation for the address is paid down the more it's
>> used?
>>
>
> Right - specifically to reduce the pool of addresses down to, ideally, one
> address per section/indivisible chunk of machine code (per subsection in
> MachO, for instance) (whereas currently there are many addresses per
> section)
>
>
>> How do you figure the offset out?
>>
>
> Label difference - same as is done for DW_AT_high_pc today in DWARFv4 and
> DWARFv5 in LLVM. high_pc currently uses the low_pc addresse to be relative
> to, in this proposed situation, we'd use a symbol that's in the first bit
> of debug info in the section (or subsection in MachO). So the low_pc of the
> subprogram/function, for instance, or if there are two functions in the
> same section with debug info for both, the low_pc of the first of those
> functions, etc...
>
>
> If the label difference in a low_pc attribute is relative to the start of
> a section, could a linker orderfile pass break the dwarf unless it updates
> the offset?
>
Nah - terminologically, ELF sections are indivisible - more akin to MachO
subsections. ELF files can have multiple sections with the same name (as is
used for comdat sections for inline functions, and for -ffunction-sections
(roughly equivalent to MachO's "subsections via symbols", as I understand
it) (or can use ".text.suffix" naming to give each separate .text section
its own name - but the linker strips the suffixes and concatenates all
these together into the final linked .text section)
> Ditto, I suppose, for an intra-function offset when something like
> propeller is used to reorder basic blocks (I’m thinking of
> At_call_return_pc now).
>
Yeah - currently the "base address" for each section is determined by the
first function with debug info being emitted in that section (
https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L1787
)
- with PROPELLER we'd need to add similar code when function fragments are
emitted. (I'm planning to check the PROPELLER work in progress tree soon
and do another sanity pass over the debug info emitted to check this is
working as intended - in part because this base address selection, coupled
with DWARFv5 and maybe with the changes I'm suggesting in this thread (&
will commit under flags "soon" (might take me a week or two judging by my
review/bug/investigation load right now... *fingers crossed*)) might make
PROPELLER less expensive in terms of debug info size, or more expensive
relative to the significant improvements this provides)
Owing to the way MachO debug info distribution works differently & if I
understand correctly doesn't need relocations in many cases due to
DWARF-aware parsing/linking (& if it does use relocations, I've no
knowledge of when/how and how big they are compared to the ELF relocations
I've been measuring) it's quite possible MachO would have different
tradeoffs in this space.
> Apologies if this has been answered elsewhere, I suppose there must be a
> solution for this for At_high_pc to work.
>
> vedant
>
>
>
>
>>
>> thanks,
>> vedant
>>
>> On Jan 8, 2020, at 1:33 PM, David Blaikie via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>> Sounds good all round - I'll commit these two modes, and maybe even the
>> third (given Sony's interest & possible interest in changing their consumer
>> to handle it) of a custom form to eek out the last few bytes from the more
>> direct addr+offset encoding.
>>
>> I'll follow up here with flag names and revision numbers once they're in.
>>
>> On Wed, Jan 8, 2020 at 1:26 PM Robinson, Paul <paul.robinson at sony.com>
>> wrote:
>>
>>> On some previous occasion that introduced additional indirection
>>> (don't remember the details) my debugger people groused about the
>>> additional performance cost of chasing down data in a different
>>> object-file section. So we (Sony) might be happier with low_pc as
>>> expressions, than with a ranges-always solution.
>>>
>>> But hard to say without data, and getting both modes in at least
>>> as a temporary thing sounds like a good plan.
>>> --paulr
>>>
>>>
>>> > -----Original Message-----
>>> > From: aprantl at apple.com <aprantl at apple.com>
>>> > Sent: Wednesday, January 8, 2020 1:49 PM
>>> > To: David Blaikie <dblaikie at gmail.com>
>>> > Cc: llvm-dev <llvm-dev at lists.llvm.org>; Jonas Devlieghere
>>> > <jdevlieghere at apple.com>; Robinson, Paul <paul.robinson at sony.com>;
>>> Eric
>>> > Christopher <echristo at gmail.com>; Frederic Riss <friss at apple.com>
>>> > Subject: Re: Increasing address pool reuse/reducing .o file size in
>>> > DWARFv5
>>> >
>>> > I think this sounds like a good plan for Linux. I would like to see the
>>> > numbers for Darwin (= non-split DWARF) to decide whether we should just
>>> > make that the default. Eric's suggestion of having this committed as an
>>> > option first seems like a good step in that direction. If it is an
>>> > advantage across the board we can remove the option and just make this
>>> the
>>> > default behavior.
>>> >
>>> > thanks,
>>> > adrian
>>> >
>>> > > On Dec 30, 2019, at 12:08 PM, David Blaikie <dblaikie at gmail.com>
>>> wrote:
>>> > >
>>> > > tl;dr: in DWARFv5, using DW_AT_ranges even when the range is
>>> contiguous
>>> > reduces linked, uncompressed debug_addr size for optimized builds by
>>> 93%
>>> > and reduces total .o file size (with compression and split) by 15%. It
>>> > does grow .dwo file size a bit - DWARFv5, no compression, not split
>>> shows
>>> > the net effect if all bytes are equal: -O3 clang binary grows by 0.4%,
>>> -O0
>>> > clang binary shrinks by 0.1%
>>> > > Should we enable this strategy by default for DWARFv5, for
>>> DWARFv5+Split
>>> > DWARF, or not by default at all/only under a flag?
>>> > >
>>> > >
>>> > >
>>> > > So, I've brought this up a few times before - that DWARFv5 does a
>>> pretty
>>> > good job of reducing relocations (& reducing .o file size with Split
>>> > DWARF) by allowing many uses of addresses to include some kind of
>>> > address+offset (debug_rnglists and loclists allowing "base_address"
>>> then
>>> > offset_pairs (an improvement over similar functionality in DWARFv4
>>> because
>>> > the offset pairs can be uleb encoded - so they can be quite compact))
>>> > >
>>> > > But one place that DWARFv5 misses to reduce relocations further is
>>> > direct addresses from debug_info, such as DW_AT_low_pc.
>>> > >
>>> > > For a while I've wondered if we could use an extension form for
>>> > addr+offset, and I prototyped this without an extension attribute, but
>>> > instead using exprloc. This has slightly higher overhead to express
>>> the...
>>> > expression. (it's 9 bytes in total, could be as few as 5 with a custom
>>> > form)
>>> > >
>>> > > But I had another idea that's more instantly deployable: Why not use
>>> > DW_AT_ranges even when the range is contiguous? That way the low_pc
>>> that
>>> > previously couldn't use an existing address pool entry + offset, could
>>> use
>>> > the rnglist support for base address.
>>> > >
>>> > > The only unnecessary address pool entries that remain that I've found
>>> > are DW_AT_low_pc for DW_TAG_labels - but there's only a handful of
>>> those
>>> > in most code. So the "ranges everywhere" strategy gets the addresses
>>> for
>>> > optimized clang down from 4758 (v4 address pool used 9923 addresses...
>>> )
>>> > to 342, with about ~4 "extra" addresses for DW_TAG_labels.
>>> > >
>>> > > This could also be a bit less costly if DWARFv5 rnglists didn't use a
>>> > separate offset table (instead encoding the offsets directly in
>>> > debug_info, rather than using indexes)
>>> > >
>>> > > I have patches for both the addr+offset exprloc and for the ranges-
>>> > always, both with -mllvm flags - do people think they're both worth
>>> > committing for experimentation? Neither? Default on in some cases (like
>>> > Split DWARF)?
>>> > >
>>> > > Thanks,
>>> > > - Dave
>>>
>>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200113/87b9708c/attachment.html>
More information about the llvm-dev
mailing list