[llvm-dev] [ThinLTO] Reducing imported debug metadata

Teresa Johnson via llvm-dev llvm-dev at lists.llvm.org
Fri Dec 9 15:41:40 PST 2016


On Fri, Dec 9, 2016 at 6:13 AM, Teresa Johnson <tejohnson at google.com> wrote:

>
>
> On Thu, Dec 8, 2016 at 8:59 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:
>
>>
>> On Dec 7, 2016, at 11:02 AM, Teresa Johnson <tejohnson at google.com> wrote:
>>
>> +pcc to answer question about global variable list on CU
>>
>> Thanks for the comments, responses below.
>>
>> Teresa
>>
>> On Wed, Dec 7, 2016 at 10:36 AM, Duncan P. N. Exon Smith <
>> dexonsmith at apple.com> wrote:
>>
>>>
>>> > On 2016-Dec-07, at 10:14, Teresa Johnson <tejohnson at google.com> wrote:
>>> >
>>> > A couple weeks ago I sat down with David Blaikie to figure out what we
>>> could trim when importing debug metadata during function importing. I have
>>> a prototype that reduces the resulting .o sizes significantly with ThinLTO
>>> -g. However, I ran into some issues when cleaning up part of this in
>>> preparation for a patch. Since Mehdi indicated that he and Adrian were
>>> going to be looking at this as well shortly, I wanted to send out a summary
>>> of what David Blaikie and I discussed, what I have right now, and the
>>> issues I am hitting along with possible solutions.
>>> >
>>> > There was a related older patch where I did some of the same things
>>> (D16440: [ThinLTO] Link in only necessary DICompileUnit operands), but that
>>> was made obsolete by a number of changes, such as the reversal of
>>> GlobalVariable/DIGlobalVariable edges, and a number of debug metadata
>>> changes done by Duncan (and Adrian I think?). So I started from scratch
>>> this time.
>>> >
>>> > Here's what I came up with after discussing what to import with David.
>>> Specifically, how to handle fields when importing a DICompileUnit:
>>>
>>> Do you have a profile of which (if any) of these is causing the debug
>>> info bloat?
>>>
>>> I worry/suspect that much of the bloat could be in the DILocation chains
>>> (line-table), which are generally non-mergeable.
>>>
>>
>> Here is a breakdown of the debug section sizes aggregated across all .o
>> files for a build of Chromium with -g2. These are both with ThinLTO, but
>> the first column is with all function importing disabled:
>>
>>   No importing Normal importing (aka "Head" in the below graph)
>> .debug_abbrev 32013702 60570915
>> .debug_info 1714741154 8724603593 <(872)%20460-3593>
>> .debug_line 108033535 440090737
>> .debug_loc 178426363 264907421
>> .debug_macinfo 17820     17820
>> .debug_pubnames 251081673 661630495
>> .debug_pubtypes 284086653 1867377375
>> .debug_ranges 65470016 106560864
>> .debug_str 3304122338 <(330)%20412-2338> 19013084756 <(901)%20308-4756>
>>
>> <image.png>
>>
>> I'm not sure if this answers your question above about DILocation chains
>> though - would that be directly translated into .debug_loc size?
>>
>>
>>
>> The final binary size is one proxy to identify the “cost of debug info”.
>> DILocation may not be the main contributor in the final binary, I’m not
>> sure. However another proxy is the memory consumption. Back march before we
>> worked on changing the edges in the call graph, my measurements showed that
>> DILocation was by far the largest memory consumer.
>>
>
> I haven't measured the memory reduction in the compiler from my changes,
> but the impact on the final gold link is huge.
>
>>
>>
>>
>>> > a) List of enum types
>>> >  - Not needed in the importing module: change to nullptr
>>>
>>> Seems reasonable for -flto=thin (incorrect for -flto=full, though).
>>>
>>
>> Right, all this is for ThinLTO - I haven't thought about Full LTO as I am
>> concerned about ThinLTO importing-specific bloat.
>>
>>
>>>
>>> > b) List of macros
>>> >  - Not needed in the importing module: change to nullptr
>>>
>>> We don't have macros by default, right?  You need some -gmacros flag I
>>> hope?  I remember objecting to the massive debug info bloat when they were
>>> introduced, but was convinced by an argument along the lines of
>>> "it's-never-going-to-be-on-by-default".
>>>
>>
>> No I haven't seen a case where this is non-null in my ThinLTO testing,
>> but it seemed safe to do for completeness.
>>
>>
>>> > c) List of global variables
>>> > - Only needed if we have imported the corresponding global variable
>>> definition. Since we currently don't import any global variable definitions
>>> (we should assert that there aren't any in the import list so this doesn't
>>> get stale), we can change this to nullptr.
>>>
>>> Why does this still exist?  Can't we delete this list in all modes,
>>> since GlobalValues reference their own DIGlobalVariables directly?
>>>
>>
>> Peter, can you answer this question? I looked back at the old thread and
>> ISTR that there were some cases where the reference from the CU was needed
>> but don't remember the details.
>>
>>
>>>
>>> > d) List of imported entities
>>> > - For now need to import those that have a DILocalScope (and drop
>>> others from the imported entities list on the imported CU). David had some
>>> ideas on restructuring the way these are referenced, but for now we
>>> conservatively need to keep those that may be from functions, any that are
>>> from functions not actually imported will not be emitted into the object
>>> file anyway.
>>>
>>> I'd rather just restructure these if we can.
>>>
>>
>> That sounds like a good long-term solution. For for the short term what I
>> did was incredibly simple. If it has a significant impact (haven't measured
>> this one independently), I think the short-term fix would be good in the
>> meantime. David can probably provide more details on what needs to be done
>> to fix this longer term.
>>
>>
>>
>> My main focus is a bit different from your here: while I care about the
>> size of the debug info emitted, I even care more about the impact on the
>> link-time and the memory consumption. For this reason, any fix in the
>> IRLinker is not totally satisfactory: we already read the bitcode and
>> loaded the IR.
>>
>
> This makes a big impact in memory consumption and time for gold, which
> reads the debug info. Unlike on OS X where the linker ignores it because
> the debug info is read by dsymutil.
>
> To get to the end of it we need to clearly identify the useless stuff and
>> avoid to load it out of the bitcode in the first place. If it is not loaded
>> from bitcode in the first place, the IR Linker doesn’t even risk to link it!
>>
>
> Definitely that is even better, but this is a near-term blocker for us on
> the native link side.
>
>
>>
>> Now if you have an IRLinker patch that would 1) land before the 4.0
>> branch, and 2) not be too large / too intrusive, then I’d say let’s do this
>> as a temporary stop gap while we work on a better solution for LLVM 5.0.
>>
>
> Ok great, I can post the patch that handles most of the changes I
> mentioned below today (it's ready and very simple, just need some tests).
>

D27635
Thanks,
Teresa


>
> The biggest improvement though comes from dropping the imported
> DICompositeTypes to decls. For that the two solutions I have come up with
> right now are:
> 1)  Detect these in the BitcodeReader and only create a decl instead of a
> def - implemented and quite simple
> 2) Add the ability to force convert to a decl on the source module before
> invoking any value mapping and metadata linking (e.g. in the IRLinker
> constructor) - implemented but is too aggressive as I need to figure out
> which are shared with the dest module
> 3) Somehow intervene when the metadata is mapped during IR linking to
> detect these and create a decl when mapping - this would require adding
> some kind of callback to the value mapper code I think.
>
> Approach 1 is definitely the simplest (and has the side benefit of never
> creating the def in the parsed source bitcode module in the first place). I
> can post a separate patch for that to consider.
>
> Teresa
>
>
>>>> Mehdi
>>
>>
>>
>>
>>
>>
>>> > e) List of retained types
>>> > - Leave as-is but import DICompositeTypes as type declarations. This
>>> one David thought would need to be under an option because of lldb's
>>> assumption that the DWARF match the Clang AST (I think I am stating that
>>> right, correct me if not!).
>>>
>>> How many of these exist?  I expect this list to be almost always empty.
>>> If it's not, why not?  (Did I fail to push the commit to Clang that stopped
>>> adding all C++ types here??)  Can we somehow prune the list?
>>>
>>
>> It seems that there are quite a lot. Just doing this caused a huge
>> reduction on sizes. E.g. on Chromium, importing DICompositeType as type
>> decls alone dropped the aggregate .o size increase with -g2 over plain -O2
>> from 5.2x (current ThinLTO) to 1.7x.
>>
>>
>>> I'm concerned that this is not a particularly fast thing to do.
>>>
>>
>> In what way?
>>
>>
>>>
>>> I'm also worried that this would cause major debug info quality
>>> degradation on Darwin (defer to Adrian on that...).  I'd rather leave it
>>> unoptimized unless we're sure we need to do something here.
>>>
>>
>> Maybe this is the issue David was referring to about potentially needing
>> to put this under an option to disable for lldb. For us apparently it is
>> not an issue.
>>
>>
>>> Also, why do you need this for -flto=thin?  Why not just nullptr?
>>>
>>
>> Meaning never import retained types? I think we need the type decls at
>> least - David?
>>
>>
>>>
>>> > Implementation:
>>>
>>> Generally, I'd rather see the direction of explicitly building new
>>> DICompileUnits and linking in the correct/necessary arguments, like the
>>> IRLinker explicitly builds new llvm::Functions.  Ideally, we'd only have to
>>> explicitly deal with DICompileUnit (one reason that I'm concerned about (e)
>>> above), which we can find through !llvm.dbg.cu.
>>>
>>> > a)-d) (enums/macros/global variables/imported entities):
>>> >
>>> > For a-d I have a simple solution in the IRLinker. At the very start of
>>> IRLinking in a module, if it is being linked in for function importing, I
>>> invoke a function that does the handling (i.e. from the IRLinker
>>> constructor).
>>> >
>>> > This routine handles a-c by pre-populating the ValueMap's MDMap entry
>>> for those Metadata* (e.g. the RawEnumTypes Metadata*) with nullptr, so
>>> metadata mapping automatically makes them nullptr on the imported
>>> DICompileUnit.
>>> >
>>> > For d, this routine walks the imported entities on the source CU and
>>> builds a SmallVector of those with local scopes. It then invokes
>>> replaceImportedEntities on the source CU to replace it with the new list
>>> (essentially dropping those with non-local scopes), and again the
>>> subsequent metadata mapping just works.
>>> >
>>> > e) (retained types):
>>> >
>>> > For e, changing the imported DICompisiteType to type declarations, I
>>> am having some issues. I prototyped this by doing it in the BitcodeReader.
>>> I passed in a new flag indicating that we are parsing a module for function
>>> importing. Then when parsing a METADATA_COMPOSITE_TYPE, if we are importing
>>> a module I change the calls to buildODRType and GET_OR_DISTINCT to instead
>>> create a type declaration:
>>> >  - pass in flags Flags|DINode::FlagFwdDecl
>>> >  - pass in nullptr for BaseType, Elements, VtableHolder, TemplateParams
>>> >  - pass in 0 for OffsetInBits
>>> >
>>> > Note that buildODRType will not mutate any existing DICompisiteType
>>> for that identifier found in the DITypeMap to a type declaration
>>> (FlagFwdDecl) though. This is good since we are sharing the DITypeMap with
>>> the original (destination) module, and we don't want to change any type
>>> definitions on the existing destination (importing) module to be type
>>> declarations when the same type is also in the source module we are about
>>> to import.
>>> >
>>> > When cleaning this up, I initially tried mutating the types on the
>>> source module at the start of IRLinking of imported modules. I did this by
>>> adding a new DICompositeType function to force convert an existing type
>>> definition to a type declaration (since the bitcode reader already would
>>> have created a type definition when parsing the imported source module, and
>>> as I noted above it wasn't allowed to be mutated to a declaration after
>>> that point). However, this is wrong since it ended up changing type
>>> definitions that were also used by the original destination/importing
>>> module to type declarations if they were also used in the source module
>>> (and therefore shared a DITypeMap entry). To get the forced mutation to a
>>> type decl to work here, I would somehow need to detect when a composite
>>> type on the source module was *not* also used by the original destination
>>> module. The only way I came up with off the top of my head was to also do a
>>> DebugInfoFinder::processModule on the dest module, and subtract the
>>> resulting types() from those I found when finding types on the source
>>> module I'm about to import, and only force-convert the remaining ones to
>>> type decls.
>>> >
>>> >
>>> > Interested in feedback on any of the above, and in particular on the
>>> cleanest way to create type declarations for imported types.
>>> >
>>> > Thanks!
>>> > Teresa
>>> >
>>> > --
>>> > Teresa Johnson |       Software Engineer |     tejohnson at google.com |
>>>  408-460-2413
>>>
>>>
>>
>>
>> --
>> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
>> 408-460-2413 <(408)%20460-2413>
>>
>>
>>
>
>
> --
> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
> 408-460-2413 <(408)%20460-2413>
>



-- 
Teresa Johnson |  Software Engineer |  tejohnson at google.com |  408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161209/ac366519/attachment.html>


More information about the llvm-dev mailing list