[llvm-dev] Increasing address pool reuse/reducing .o file size in DWARFv5
    Fangrui Song via llvm-dev 
    llvm-dev at lists.llvm.org
       
    Wed Feb 10 21:34:14 PST 2021
    
    
  
Hi, David, this looks great! I just started to play this under llc
-minimize-addr-in-v5= and I will study it in the coming days.
On 2021-02-10, David Blaikie via llvm-dev wrote:
>All 3 options are now implemented & I've tidied up a flag name (still an
>-mllvm flag - I don't think this should ever be a user-visible flag).
>
>-mllvm -minimize-addr-in-v5=Ranges
>  Uses debug_rnglists even for contiguous ranges if doing so would avoid
>adding another entry to .debug_addr eg: a CU with 3 functions, two in the
>same section. The first function in each section uses low/high, the CU has
>a rnglist, and can share/reuse the low_pc of those two functions. But for a
>function that is later in a section that already has another function in it
>- that one would use the low_pc of the first function in the section as its
>base address, and an offset pair - avoiding the need for a 3rd debug_addr
>entry and associated relocation
>
>-mllvm -minimize-addr-in-v5=Expressions
>  This uses the exprloc idea - using a non-trivial expression for a
>DW_AT_low_pc or other address classed attribute. This reduces the overhead
>compared to the 'Ranges' technique, and allows more cases - including
>DW_TAG_labels and DW_TAG_call_sites.
This option emits: DW_OP_addrx 0, DW_OP_const4u 9, DW_OP_plus.
DW_OP_const4u is a bit wasteful. This could be changed to DW_OP_addrx 0,
DW_OP_plus_udata 9. However, the current implementation requires the size of the
DWARF expression, and we don't know the addend size of DW_OP_plus_udata.
   .byte size_of_exprloc   # This would be dependent on the size of .uleb128
   ...
   .byte 35
   .long .Ltmp1-.Lfunc_begin0
   # it'd be nice if we can use .uleb128 .Ltmp1-.Lfunc_begin0
size_of_exprloc could be changed to a subtraction of two labels.
When .uleb128 is used, we should be careful about assembler convergence.
* GNU as hacked around the problem specifically for .gcc_except_table by inserting additional .align https://sourceware.org/bugzilla/show_bug.cgi?id=4029 It works for .gcc_except_table but can be a problem for our .uleb128 + .byte scheme.
* LLVM MC's solution is generic.
>-mllvm -minimize-addr-in-v5=Form
>   Similar to Expressions, but using a custom form to make things a bit
>more compact (has the drawback that consumers who don't recognize the form
>can't parse any of the DWARF because they can't skip over the attribute due
>to not knowing its size)
This option emits a new form: DW_FORM_LLVM_addrx_offset, which is the composite
of DW_FORM_addrx and DW_FORM_data4. This is superior to Expressions because the
bytes for the exprloc size and the plus operation can be saved.
Similar to Expressions, there is a question whether DW_FORM_udata would be better.
It could save 3 bytes compare with DW_OP_plus_udata.
>
>For comparisons, a few different build modes using 'Ranges':
>
>I should say all these builds are with compressed debug info enabled (in
>object files) and type units. the asan build uses compressed debug info in
>the linked binary and only gmlt.
>
>But the main takeaway is this seems probably (to me) worth turning on for
>Split DWARF - it does mean the final build assets (exe+dwp) are slightly
>larger (1.28%), but the benefit in object and executable size seems
>probably generically worthwhile.
>
>I plan to roll =Ranges out inside google for cases that use Split DWARF,
>see if sticks, and if so, change upstream to default to enable the feature
>under Split DWARF.
>
>For the other two modes generally make things better/reduce the tradeoff
>cost:
>So with the custom form, we can even get to a total savings in both
>intermediate (.o/.dwo) and linked (exe/dwp) files, so it might even be
>applicable to non-split DWARF. (though, again, the tradeoffs will look
>somewhat different without compression enabled and maybe without type units
>might swing it one way or another a bit (probably not much though))
>
>I'd love to have the Form version supported in lldb and enabled by default
>when tuning/targeting lldb, but not sure I have the lldb expertise/time to
>implement that just yet.
>
>Anyone have thoughts/ideas/interest in collaborating on any of this?
>
>On Tue, Jan 5, 2021 at 4:43 PM David Blaikie <dblaikie at gmail.com> wrote:
>
>> Coming back around to this...
>>
>>
>> https://github.com/llvm/llvm-project/commit/ad18b075fd63935148b460f9c6b4dce130c56b15
>> Added the "always use ranges" option, currently off-by-default, usable with
>> -gdwarf-5 -mllvm -always-use-ranges-in-v5=Enable (as the name implies, this
>> has no effect on DWARFv4 and below, because there's no benefit there). I
>> have plans to make this the default behavior for Split DWARF since moving
>> bytes from .o to .dwo is valuable even if it breaks pretty even - enough to
>> justify this even though it's a wash or maybe a slight cost to linked
>> binary size (compared to unlinked object size).
>>
>> I did come across a couple of lldb bugs related to using ranges on
>> subprograms ("Ranges everywhere" can use ranges on subprograms where the
>> subprogram is in the same section as another subprogram), sent fixes for
>> them in: https://reviews.llvm.org/D94063 and
>> https://reviews.llvm.org/D94064 - if anyone has a chance to look at
>> those, it'd be most appreciated.
>>
>> Once those lldb fixes are in, I'll make the change to enable this feature
>> by default when using Split DWARF unless anyone's got objections to that.
>>
>> & in the mean time I'm also working on patches for the other two
>> candidates - novel DWARF expressions and an LLVM extension form.
>>
>> On Mon, Jan 13, 2020 at 2:15 PM David Blaikie <dblaikie at gmail.com> wrote:
>>
>>>
>>>
>>> On Mon, Jan 13, 2020 at 1:39 PM Vedant Kumar <vedant_kumar at apple.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Jan 13, 2020, at 9:20 AM, David Blaikie via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>
>>>>
>>>> On Mon, Jan 13, 2020 at 9:03 AM Vedant Kumar <vedant_kumar at apple.com>
>>>> wrote:
>>>>
>>>>> I think I get it now, thanks for explaining!
>>>>>
>>>>> On Jan 12, 2020, at 11:44 AM, David Blaikie via llvm-dev <
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>> 
>>>>>
>>>>>
>>>>> On Fri, Jan 10, 2020 at 12:57 PM Vedant Kumar <vedant_kumar at apple.com>
>>>>> wrote:
>>>>>
>>>>>> I don't totally follow the proposed encoding change & would appreciate
>>>>>> a small example.
>>>>>>
>>>>>> Is the idea to replace e.g. an 'AT_low_pc (<direct address>) +
>>>>>> relocation for <direct address>' with an 'AT_low_pc (<indirection into a
>>>>>> pool of addresses> + offset)',
>>>>>>
>>>>>
>>>>> With Split DWARF or with DWARFv5 in LLVM at the moment, all addresses
>>>>> are indirected already. So it's:
>>>>>
>>>>> Replace "AT_low_pc (<indirection into a pool of addresses>)" with an
>>>>> "AT_low_pc (<indirection into a pool of addresses> + offset)".
>>>>>
>>>>>
>>>>>> s.t. the cost of a relocation for the address is paid down the more
>>>>>> it's used?
>>>>>>
>>>>>
>>>>> Right - specifically to reduce the pool of addresses down to, ideally,
>>>>> one address per section/indivisible chunk of machine code (per subsection
>>>>> in MachO, for instance) (whereas currently there are many addresses per
>>>>> section)
>>>>>
>>>>>
>>>>>> How do you figure the offset out?
>>>>>>
>>>>>
>>>>> Label difference - same as is done for DW_AT_high_pc today in DWARFv4
>>>>> and DWARFv5 in LLVM. high_pc currently uses the low_pc addresse to be
>>>>> relative to, in this proposed situation, we'd use a symbol that's in the
>>>>> first bit of debug info in the section (or subsection in MachO). So the
>>>>> low_pc of the subprogram/function, for instance, or if there are two
>>>>> functions in the same section with debug info for both, the low_pc of the
>>>>> first of those functions, etc...
>>>>>
>>>>>
>>>>> If the label difference in a low_pc attribute is relative to the start
>>>>> of a section, could a linker orderfile pass break the dwarf unless it
>>>>> updates the offset?
>>>>>
>>>>
>>>> Nah - terminologically, ELF sections are indivisible - more akin to
>>>> MachO subsections. ELF files can have multiple sections with the same name
>>>> (as is used for comdat sections for inline functions, and for
>>>> -ffunction-sections (roughly equivalent to MachO's "subsections via
>>>> symbols", as I understand it) (or can use ".text.suffix" naming to give
>>>> each separate .text section its own name - but the linker strips the
>>>> suffixes and concatenates all these together into the final linked .text
>>>> section)
>>>>
>>>>
>>>> I see, so an ELF linker may reorder sections relative to each other, but
>>>> not the contents of a section. (That matches up with what I've read
>>>> elsewhere - you'd use -ffunction-sections to reorder function symbols,
>>>> IIRC.)
>>>>
>>>
>>> Right.
>>>
>>>
>>>> And in this proposal to increase address pool reuse, label differences
>>>> in a MachO would be relative to the subsection.
>>>>
>>>
>>> Even before my proposal, there are already many cases where rnglists and
>>> loclists in DWARFv5 (& location lists in DWARFv4) will use selectively
>>> chosen base addresses and symbol differences as often as possible (insofar
>>> as I could do that when working/experimenting with ELF).
>>>
>>> So without function sections, for instance - rnglists for sub-function
>>> ranges (ignoring PROPELLER for now/in this part of the discussion).
>>>
>>> Perhaps an example would be helpful. Here's LLVM's current behavior with
>>> DWARFv5 and ELF, without function sections:
>>>
>>> int f1();
>>> void f2() {
>>>   if (int i = f1()) {
>>>     f1();
>>>   }
>>> }
>>> void f3() {
>>>   if (f1()) {
>>>     int i = f1();
>>>   }
>>> }
>>> __attribute__((section(".other"))) void f4() {
>>> }
>>>
>>> In this code there are only two ELF sections (".text" contains the
>>> definitions of f2 and f3, ".other" contains the definition of f4) and so we
>>> /should/ be able to only have 2 relocations in the debug info.
>>>
>>> (I'm exploiting something of a bug/quirk in Clang/LLVM's debug info that
>>> causes, even at -O0, the lexical_block for the 'if' to have a hole in it,
>>> where the call to f1 is, so it has ranges rather than low/high pc)
>>>
>>> In DWARFv4 this example would've used 10 relocations. (on the CU ranges,
>>> there would be begin/end for the ".text" range covering f2 and f3, and
>>> begin/end for the ".other" range covering f4, then the range list for the
>>> "if" lexical_block would contain another 2 pairs (4 addresses/relocations),
>>> one relocation for f2's low_pc, one for f3's 'if' lexical_block).
>>>
>>> In DWARFv5, we see the following:
>>>
>>> 0x00000014: [DW_RLE_base_addressx]:  0x0000000000000000
>>> 0x00000016: [DW_RLE_offset_pair  ]:  0x0000000000000008,
>>> 0x0000000000000014
>>> 0x00000019: [DW_RLE_offset_pair  ]:  0x000000000000001a,
>>> 0x000000000000001f
>>> 0x0000001c: [DW_RLE_end_of_list  ]
>>> 0x0000001d: [DW_RLE_startx_length]:  0x0000000000000000,
>>> 0x0000000000000036
>>> 0x00000020: [DW_RLE_startx_length]:  0x0000000000000002,
>>> 0x0000000000000006
>>> 0x00000023: [DW_RLE_end_of_list  ]
>>>
>>> The first location list is for the 'if' scope, the second is for the CU.
>>> Both are able to efficiently select encodings and base addresses.
>>>
>>> But the debug_addr has 4 addresses in it - the address at index 1 (not
>>> used in the rnglists shown above - we see index 0 and index 2 are used
>>> there) is for the low_pc of f3's subprogram, and the address at index 2 is
>>> for the low_pc of f3's if block/scope.
>>>
>>> That's the address/relocation that would be... addressed by the change
>>> I'm proposing. One way to avoid that relocation would be to encode f3's
>>> address range using a rnglist - this is fully backwards compatible, and
>>> would produce a rnglist like this:
>>>
>>> [DW_RLE_base_addressx]:  0x0000000000000000
>>> [DW_RLE_offset_pair  ]:  0x0000000000000030, 0x0000000000000036
>>> [DW_RLE_end_of_list  ]
>>>
>>> Similarly, f3's if block could use a rangelist like:
>>>
>>> [DW_RLE_base_addressx]:  0x0000000000000000
>>> [DW_RLE_offset_pair  ]:  0x0000000000000046, 0x0000000000000054
>>> [DW_RLE_end_of_list  ]
>>>
>>> As you can imagine, there are quite a few ranges (especially once you get
>>> inlining) that use low/high_pc, and could benefit from the reduction in
>>> relocations by using this strategy. Though it isn't optimal (the range list
>>> encoding isn't intended to be good for this use case) in terms of size cost
>>> - hence the possibility of using DWARF expressions for address class
>>> attributes, or a custom form that would more directly encode the <indirect
>>> address> + <offset>.
>>>
>>> In Propeller, is basic block reordering done after a .o is emitted?
>>>>
>>>
>>> Yes.
>>>
>>>
>>>> If so, I suppose I don't yet see how the proposed scheme is resilient to
>>>> this reordering.
>>>>
>>>
>>> With PROPELLER any function that is fragmented into reorderable sections
>>> must necessarily use ranges to describe the function's address range - but,
>>> again, choosing base addresses strategically & using relative references
>>> whenever possible, would help reduce the cost of PROPELLER's debug info.
>>>
>>>
>>>> OTOH if block reordering is done just before the label difference is
>>>> evaluated, then there shouldn't be any issue.
>>>>
>>>>
>>>> Ditto, I suppose, for an intra-function offset when something like
>>>>> propeller is used to reorder basic blocks (I’m thinking of
>>>>> At_call_return_pc now).
>>>>>
>>>>
>>>> Yeah - currently the "base address" for each section is determined by
>>>> the first function with debug info being emitted in that section (
>>>> https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L1787 )
>>>> - with PROPELLER we'd need to add similar code when function fragments are
>>>> emitted. (I'm planning to check the PROPELLER work in progress tree soon
>>>> and do another sanity pass over the debug info emitted to check this is
>>>> working as intended - in part because this base address selection, coupled
>>>> with DWARFv5 and maybe with the changes I'm suggesting in this thread (&
>>>> will commit under flags "soon" (might take me a week or two judging by my
>>>> review/bug/investigation load right now... *fingers crossed*)) might make
>>>> PROPELLER less expensive in terms of debug info size, or more expensive
>>>> relative to the significant improvements this provides)
>>>>
>>>>
>>>> Thanks for investigating!
>>>>
>>>> Owing to the way MachO debug info distribution works differently & if I
>>>> understand correctly doesn't need relocations in many cases due to
>>>> DWARF-aware parsing/linking (& if it does use relocations, I've no
>>>> knowledge of when/how and how big they are compared to the ELF relocations
>>>> I've been measuring) it's quite possible MachO would have different
>>>> tradeoffs in this space.
>>>>
>>>>
>>>> A linked .dSYM (analogous to an ELF .dwp, IIUC) doesn't contain
>>>> relocations for AT_low_pc or AT_call_return_pc in the simple examples I
>>>> tried out. We do emit relocations for those attributes in MachO object
>>>> files (there isn't something analogous to a .dwo on MachO, the debug info
>>>> just goes into a different set of sections in the .o). My understanding
>>>> (based on the definition of `macho_relocation_info` in the ld64 sources) is
>>>> that MachO relocations are 8 bytes in size. It looks like ELF rel/rela
>>>> relocations are 16/24 bytes in size, but I'm not sure why (perhaps they're
>>>> more extensible / encode more information).
>>>>
>>>
>>> OK *nod* with the smaller encoding it may be less of a pressing issue for
>>> you & the tradeoff may be different.
>>>
>>>
>>>> Would a vanilla DWARFv4 .dwp (without your patches applied) contain a
>>>> relocation for each 'AT_low_pc (<direct address>)'?
>>>>
>>>
>>> DWP files contain no direct addresses - they are all indirect through the
>>> address pool. But, yes, for a DWARFv4 Split DWARF build, low_pcs don't have
>>> an opportunity to reuse a strategically chosen base address - they have to
>>> use an addrx form & the debug_addr section would have that specific address
>>> with a relocation for it.
>>>
>>>
>>>>
>>>> vedant
>>>>
>>>>
>>>>
>>>>> Apologies if this has been answered elsewhere, I suppose there must be
>>>>> a solution for this for At_high_pc to work.
>>>>>
>>>>> vedant
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> thanks,
>>>>>> vedant
>>>>>>
>>>>>> On Jan 8, 2020, at 1:33 PM, David Blaikie via llvm-dev <
>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>
>>>>>> Sounds good all round - I'll commit these two modes, and maybe even
>>>>>> the third (given Sony's interest & possible interest in changing their
>>>>>> consumer to handle it) of a custom form to eek out the last few bytes from
>>>>>> the more direct addr+offset encoding.
>>>>>>
>>>>>> I'll follow up here with flag names and revision numbers once they're
>>>>>> in.
>>>>>>
>>>>>> On Wed, Jan 8, 2020 at 1:26 PM Robinson, Paul <paul.robinson at sony.com>
>>>>>> wrote:
>>>>>>
>>>>>>> On some previous occasion that introduced additional indirection
>>>>>>> (don't remember the details) my debugger people groused about the
>>>>>>> additional performance cost of chasing down data in a different
>>>>>>> object-file section.  So we (Sony) might be happier with low_pc as
>>>>>>> expressions, than with a ranges-always solution.
>>>>>>>
>>>>>>> But hard to say without data, and getting both modes in at least
>>>>>>> as a temporary thing sounds like a good plan.
>>>>>>> --paulr
>>>>>>>
>>>>>>>
>>>>>>> > -----Original Message-----
>>>>>>> > From: aprantl at apple.com <aprantl at apple.com>
>>>>>>> > Sent: Wednesday, January 8, 2020 1:49 PM
>>>>>>> > To: David Blaikie <dblaikie at gmail.com>
>>>>>>> > Cc: llvm-dev <llvm-dev at lists.llvm.org>; Jonas Devlieghere
>>>>>>> > <jdevlieghere at apple.com>; Robinson, Paul <paul.robinson at sony.com>;
>>>>>>> Eric
>>>>>>> > Christopher <echristo at gmail.com>; Frederic Riss <friss at apple.com>
>>>>>>> > Subject: Re: Increasing address pool reuse/reducing .o file size in
>>>>>>> > DWARFv5
>>>>>>> >
>>>>>>> > I think this sounds like a good plan for Linux. I would like to see
>>>>>>> the
>>>>>>> > numbers for Darwin (= non-split DWARF) to decide whether we should
>>>>>>> just
>>>>>>> > make that the default. Eric's suggestion of having this committed
>>>>>>> as an
>>>>>>> > option first seems like a good step in that direction. If it is an
>>>>>>> > advantage across the board we can remove the option and just make
>>>>>>> this the
>>>>>>> > default behavior.
>>>>>>> >
>>>>>>> > thanks,
>>>>>>> > adrian
>>>>>>> >
>>>>>>> > > On Dec 30, 2019, at 12:08 PM, David Blaikie <dblaikie at gmail.com>
>>>>>>> wrote:
>>>>>>> > >
>>>>>>> > > tl;dr: in DWARFv5, using DW_AT_ranges even when the range is
>>>>>>> contiguous
>>>>>>> > reduces linked, uncompressed debug_addr size for optimized builds
>>>>>>> by 93%
>>>>>>> > and reduces total .o file size (with compression and split) by 15%.
>>>>>>> It
>>>>>>> > does grow .dwo file size a bit - DWARFv5, no compression, not split
>>>>>>> shows
>>>>>>> > the net effect if all bytes are equal: -O3 clang binary grows by
>>>>>>> 0.4%, -O0
>>>>>>> > clang binary shrinks by 0.1%
>>>>>>> > > Should we enable this strategy by default for DWARFv5, for
>>>>>>> DWARFv5+Split
>>>>>>> > DWARF, or not by default at all/only under a flag?
>>>>>>> > >
>>>>>>> > >
>>>>>>> > >
>>>>>>> > > So, I've brought this up a few times before - that DWARFv5 does a
>>>>>>> pretty
>>>>>>> > good job of reducing relocations (& reducing .o file size with Split
>>>>>>> > DWARF) by allowing many uses of addresses to include some kind of
>>>>>>> > address+offset (debug_rnglists and loclists allowing "base_address"
>>>>>>> then
>>>>>>> > offset_pairs (an improvement over similar functionality in DWARFv4
>>>>>>> because
>>>>>>> > the offset pairs can be uleb encoded - so they can be quite
>>>>>>> compact))
>>>>>>> > >
>>>>>>> > > But one place that DWARFv5 misses to reduce relocations further is
>>>>>>> > direct addresses from debug_info, such as DW_AT_low_pc.
>>>>>>> > >
>>>>>>> > > For a while I've wondered if we could use an extension form for
>>>>>>> > addr+offset, and I prototyped this without an extension attribute,
>>>>>>> but
>>>>>>> > instead using exprloc. This has slightly higher overhead to express
>>>>>>> the...
>>>>>>> > expression. (it's 9 bytes in total, could be as few as 5 with a
>>>>>>> custom
>>>>>>> > form)
>>>>>>> > >
>>>>>>> > > But I had another idea that's more instantly deployable: Why not
>>>>>>> use
>>>>>>> > DW_AT_ranges even when the range is contiguous? That way the low_pc
>>>>>>> that
>>>>>>> > previously couldn't use an existing address pool entry + offset,
>>>>>>> could use
>>>>>>> > the rnglist support for base address.
>>>>>>> > >
>>>>>>> > > The only unnecessary address pool entries that remain that I've
>>>>>>> found
>>>>>>> > are DW_AT_low_pc for DW_TAG_labels - but there's only a handful of
>>>>>>> those
>>>>>>> > in most code. So the "ranges everywhere" strategy gets the
>>>>>>> addresses for
>>>>>>> > optimized clang down from 4758 (v4 address pool used 9923
>>>>>>> addresses... )
>>>>>>> > to 342, with about ~4 "extra" addresses for DW_TAG_labels.
>>>>>>> > >
>>>>>>> > > This could also be a bit less costly if DWARFv5 rnglists didn't
>>>>>>> use a
>>>>>>> > separate offset table (instead encoding the offsets directly in
>>>>>>> > debug_info, rather than using indexes)
>>>>>>> > >
>>>>>>> > > I have patches for both the addr+offset exprloc and for the
>>>>>>> ranges-
>>>>>>> > always, both with -mllvm flags - do people think they're both worth
>>>>>>> > committing for experimentation? Neither? Default on in some cases
>>>>>>> (like
>>>>>>> > Split DWARF)?
>>>>>>> > >
>>>>>>> > > Thanks,
>>>>>>> > > - Dave
>>>>>>>
>>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>>
>_______________________________________________
>LLVM Developers mailing list
>llvm-dev at lists.llvm.org
>https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
    
    
More information about the llvm-dev
mailing list