[llvm-dev] [ThinLTO] Reducing imported debug metadata

Teresa Johnson via llvm-dev llvm-dev at lists.llvm.org
Fri Dec 9 06:13:26 PST 2016


On Thu, Dec 8, 2016 at 8:59 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:

>
> On Dec 7, 2016, at 11:02 AM, Teresa Johnson <tejohnson at google.com> wrote:
>
> +pcc to answer question about global variable list on CU
>
> Thanks for the comments, responses below.
>
> Teresa
>
> On Wed, Dec 7, 2016 at 10:36 AM, Duncan P. N. Exon Smith <
> dexonsmith at apple.com> wrote:
>
>>
>> > On 2016-Dec-07, at 10:14, Teresa Johnson <tejohnson at google.com> wrote:
>> >
>> > A couple weeks ago I sat down with David Blaikie to figure out what we
>> could trim when importing debug metadata during function importing. I have
>> a prototype that reduces the resulting .o sizes significantly with ThinLTO
>> -g. However, I ran into some issues when cleaning up part of this in
>> preparation for a patch. Since Mehdi indicated that he and Adrian were
>> going to be looking at this as well shortly, I wanted to send out a summary
>> of what David Blaikie and I discussed, what I have right now, and the
>> issues I am hitting along with possible solutions.
>> >
>> > There was a related older patch where I did some of the same things
>> (D16440: [ThinLTO] Link in only necessary DICompileUnit operands), but that
>> was made obsolete by a number of changes, such as the reversal of
>> GlobalVariable/DIGlobalVariable edges, and a number of debug metadata
>> changes done by Duncan (and Adrian I think?). So I started from scratch
>> this time.
>> >
>> > Here's what I came up with after discussing what to import with David.
>> Specifically, how to handle fields when importing a DICompileUnit:
>>
>> Do you have a profile of which (if any) of these is causing the debug
>> info bloat?
>>
>> I worry/suspect that much of the bloat could be in the DILocation chains
>> (line-table), which are generally non-mergeable.
>>
>
> Here is a breakdown of the debug section sizes aggregated across all .o
> files for a build of Chromium with -g2. These are both with ThinLTO, but
> the first column is with all function importing disabled:
>
>   No importing Normal importing (aka "Head" in the below graph)
> .debug_abbrev 32013702 60570915
> .debug_info 1714741154 8724603593 <(872)%20460-3593>
> .debug_line 108033535 440090737
> .debug_loc 178426363 264907421
> .debug_macinfo 17820     17820
> .debug_pubnames 251081673 661630495
> .debug_pubtypes 284086653 1867377375
> .debug_ranges 65470016 106560864
> .debug_str 3304122338 <(330)%20412-2338> 19013084756 <(901)%20308-4756>
>
> <image.png>
>
> I'm not sure if this answers your question above about DILocation chains
> though - would that be directly translated into .debug_loc size?
>
>
>
> The final binary size is one proxy to identify the “cost of debug info”.
> DILocation may not be the main contributor in the final binary, I’m not
> sure. However another proxy is the memory consumption. Back march before we
> worked on changing the edges in the call graph, my measurements showed that
> DILocation was by far the largest memory consumer.
>

I haven't measured the memory reduction in the compiler from my changes,
but the impact on the final gold link is huge.

>
>
>
>> > a) List of enum types
>> >  - Not needed in the importing module: change to nullptr
>>
>> Seems reasonable for -flto=thin (incorrect for -flto=full, though).
>>
>
> Right, all this is for ThinLTO - I haven't thought about Full LTO as I am
> concerned about ThinLTO importing-specific bloat.
>
>
>>
>> > b) List of macros
>> >  - Not needed in the importing module: change to nullptr
>>
>> We don't have macros by default, right?  You need some -gmacros flag I
>> hope?  I remember objecting to the massive debug info bloat when they were
>> introduced, but was convinced by an argument along the lines of
>> "it's-never-going-to-be-on-by-default".
>>
>
> No I haven't seen a case where this is non-null in my ThinLTO testing, but
> it seemed safe to do for completeness.
>
>
>> > c) List of global variables
>> > - Only needed if we have imported the corresponding global variable
>> definition. Since we currently don't import any global variable definitions
>> (we should assert that there aren't any in the import list so this doesn't
>> get stale), we can change this to nullptr.
>>
>> Why does this still exist?  Can't we delete this list in all modes, since
>> GlobalValues reference their own DIGlobalVariables directly?
>>
>
> Peter, can you answer this question? I looked back at the old thread and
> ISTR that there were some cases where the reference from the CU was needed
> but don't remember the details.
>
>
>>
>> > d) List of imported entities
>> > - For now need to import those that have a DILocalScope (and drop
>> others from the imported entities list on the imported CU). David had some
>> ideas on restructuring the way these are referenced, but for now we
>> conservatively need to keep those that may be from functions, any that are
>> from functions not actually imported will not be emitted into the object
>> file anyway.
>>
>> I'd rather just restructure these if we can.
>>
>
> That sounds like a good long-term solution. For for the short term what I
> did was incredibly simple. If it has a significant impact (haven't measured
> this one independently), I think the short-term fix would be good in the
> meantime. David can probably provide more details on what needs to be done
> to fix this longer term.
>
>
>
> My main focus is a bit different from your here: while I care about the
> size of the debug info emitted, I even care more about the impact on the
> link-time and the memory consumption. For this reason, any fix in the
> IRLinker is not totally satisfactory: we already read the bitcode and
> loaded the IR.
>

This makes a big impact in memory consumption and time for gold, which
reads the debug info. Unlike on OS X where the linker ignores it because
the debug info is read by dsymutil.

To get to the end of it we need to clearly identify the useless stuff and
> avoid to load it out of the bitcode in the first place. If it is not loaded
> from bitcode in the first place, the IR Linker doesn’t even risk to link it!
>

Definitely that is even better, but this is a near-term blocker for us on
the native link side.


>
> Now if you have an IRLinker patch that would 1) land before the 4.0
> branch, and 2) not be too large / too intrusive, then I’d say let’s do this
> as a temporary stop gap while we work on a better solution for LLVM 5.0.
>

Ok great, I can post the patch that handles most of the changes I mentioned
below today (it's ready and very simple, just need some tests).

The biggest improvement though comes from dropping the imported
DICompositeTypes to decls. For that the two solutions I have come up with
right now are:
1)  Detect these in the BitcodeReader and only create a decl instead of a
def - implemented and quite simple
2) Add the ability to force convert to a decl on the source module before
invoking any value mapping and metadata linking (e.g. in the IRLinker
constructor) - implemented but is too aggressive as I need to figure out
which are shared with the dest module
3) Somehow intervene when the metadata is mapped during IR linking to
detect these and create a decl when mapping - this would require adding
some kind of callback to the value mapper code I think.

Approach 1 is definitely the simplest (and has the side benefit of never
creating the def in the parsed source bitcode module in the first place). I
can post a separate patch for that to consider.

Teresa


>> Mehdi
>
>
>
>
>
>
>> > e) List of retained types
>> > - Leave as-is but import DICompositeTypes as type declarations. This
>> one David thought would need to be under an option because of lldb's
>> assumption that the DWARF match the Clang AST (I think I am stating that
>> right, correct me if not!).
>>
>> How many of these exist?  I expect this list to be almost always empty.
>> If it's not, why not?  (Did I fail to push the commit to Clang that stopped
>> adding all C++ types here??)  Can we somehow prune the list?
>>
>
> It seems that there are quite a lot. Just doing this caused a huge
> reduction on sizes. E.g. on Chromium, importing DICompositeType as type
> decls alone dropped the aggregate .o size increase with -g2 over plain -O2
> from 5.2x (current ThinLTO) to 1.7x.
>
>
>> I'm concerned that this is not a particularly fast thing to do.
>>
>
> In what way?
>
>
>>
>> I'm also worried that this would cause major debug info quality
>> degradation on Darwin (defer to Adrian on that...).  I'd rather leave it
>> unoptimized unless we're sure we need to do something here.
>>
>
> Maybe this is the issue David was referring to about potentially needing
> to put this under an option to disable for lldb. For us apparently it is
> not an issue.
>
>
>> Also, why do you need this for -flto=thin?  Why not just nullptr?
>>
>
> Meaning never import retained types? I think we need the type decls at
> least - David?
>
>
>>
>> > Implementation:
>>
>> Generally, I'd rather see the direction of explicitly building new
>> DICompileUnits and linking in the correct/necessary arguments, like the
>> IRLinker explicitly builds new llvm::Functions.  Ideally, we'd only have to
>> explicitly deal with DICompileUnit (one reason that I'm concerned about (e)
>> above), which we can find through !llvm.dbg.cu.
>>
>> > a)-d) (enums/macros/global variables/imported entities):
>> >
>> > For a-d I have a simple solution in the IRLinker. At the very start of
>> IRLinking in a module, if it is being linked in for function importing, I
>> invoke a function that does the handling (i.e. from the IRLinker
>> constructor).
>> >
>> > This routine handles a-c by pre-populating the ValueMap's MDMap entry
>> for those Metadata* (e.g. the RawEnumTypes Metadata*) with nullptr, so
>> metadata mapping automatically makes them nullptr on the imported
>> DICompileUnit.
>> >
>> > For d, this routine walks the imported entities on the source CU and
>> builds a SmallVector of those with local scopes. It then invokes
>> replaceImportedEntities on the source CU to replace it with the new list
>> (essentially dropping those with non-local scopes), and again the
>> subsequent metadata mapping just works.
>> >
>> > e) (retained types):
>> >
>> > For e, changing the imported DICompisiteType to type declarations, I am
>> having some issues. I prototyped this by doing it in the BitcodeReader. I
>> passed in a new flag indicating that we are parsing a module for function
>> importing. Then when parsing a METADATA_COMPOSITE_TYPE, if we are importing
>> a module I change the calls to buildODRType and GET_OR_DISTINCT to instead
>> create a type declaration:
>> >  - pass in flags Flags|DINode::FlagFwdDecl
>> >  - pass in nullptr for BaseType, Elements, VtableHolder, TemplateParams
>> >  - pass in 0 for OffsetInBits
>> >
>> > Note that buildODRType will not mutate any existing DICompisiteType for
>> that identifier found in the DITypeMap to a type declaration (FlagFwdDecl)
>> though. This is good since we are sharing the DITypeMap with the original
>> (destination) module, and we don't want to change any type definitions on
>> the existing destination (importing) module to be type declarations when
>> the same type is also in the source module we are about to import.
>> >
>> > When cleaning this up, I initially tried mutating the types on the
>> source module at the start of IRLinking of imported modules. I did this by
>> adding a new DICompositeType function to force convert an existing type
>> definition to a type declaration (since the bitcode reader already would
>> have created a type definition when parsing the imported source module, and
>> as I noted above it wasn't allowed to be mutated to a declaration after
>> that point). However, this is wrong since it ended up changing type
>> definitions that were also used by the original destination/importing
>> module to type declarations if they were also used in the source module
>> (and therefore shared a DITypeMap entry). To get the forced mutation to a
>> type decl to work here, I would somehow need to detect when a composite
>> type on the source module was *not* also used by the original destination
>> module. The only way I came up with off the top of my head was to also do a
>> DebugInfoFinder::processModule on the dest module, and subtract the
>> resulting types() from those I found when finding types on the source
>> module I'm about to import, and only force-convert the remaining ones to
>> type decls.
>> >
>> >
>> > Interested in feedback on any of the above, and in particular on the
>> cleanest way to create type declarations for imported types.
>> >
>> > Thanks!
>> > Teresa
>> >
>> > --
>> > Teresa Johnson |       Software Engineer |     tejohnson at google.com |
>> 408-460-2413
>>
>>
>
>
> --
> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
> 408-460-2413 <(408)%20460-2413>
>
>
>


-- 
Teresa Johnson |  Software Engineer |  tejohnson at google.com |  408-460-2413
<(408)%20460-2413>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161209/faf21c81/attachment.html>


More information about the llvm-dev mailing list