<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Wed, Dec 7, 2016 at 11:07 AM Teresa Johnson <<a href="mailto:tejohnson@google.com">tejohnson@google.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="gmail_msg">David and I crossed wires! Some more comments below. Thanks, Teresa<div class="gmail_extra gmail_msg"><br class="gmail_msg"><div class="gmail_quote gmail_msg"></div></div></div><div dir="ltr" class="gmail_msg"><div class="gmail_extra gmail_msg"><div class="gmail_quote gmail_msg">On Wed, Dec 7, 2016 at 11:02 AM, David Blaikie <span dir="ltr" class="gmail_msg"><<a href="mailto:dblaikie@gmail.com" class="gmail_msg" target="_blank">dblaikie@gmail.com</a>></span> wrote:<br class="gmail_msg"><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="gmail_msg"><br class="gmail_msg"><br class="gmail_msg"><div class="gmail_quote gmail_msg"><span class="gmail_msg"><div dir="ltr" class="gmail_msg">On Wed, Dec 7, 2016 at 10:36 AM Duncan P. N. Exon Smith <<a href="mailto:dexonsmith@apple.com" class="gmail_msg" target="_blank">dexonsmith@apple.com</a>> wrote:<br class="gmail_msg"></div><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> On 2016-Dec-07, at 10:14, Teresa Johnson <<a href="mailto:tejohnson@google.com" class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg" target="_blank">tejohnson@google.com</a>> wrote:<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
><br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> A couple weeks ago I sat down with David Blaikie to figure out what we could trim when importing debug metadata during function importing. I have a prototype that reduces the resulting .o sizes significantly with ThinLTO -g. However, I ran into some issues when cleaning up part of this in preparation for a patch. Since Mehdi indicated that he and Adrian were going to be looking at this as well shortly, I wanted to send out a summary of what David Blaikie and I discussed, what I have right now, and the issues I am hitting along with possible solutions.<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
><br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> There was a related older patch where I did some of the same things (D16440: [ThinLTO] Link in only necessary DICompileUnit operands), but that was made obsolete by a number of changes, such as the reversal of GlobalVariable/DIGlobalVariable edges, and a number of debug metadata changes done by Duncan (and Adrian I think?). So I started from scratch this time.<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
><br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> Here's what I came up with after discussing what to import with David. Specifically, how to handle fields when importing a DICompileUnit:<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
Do you have a profile of which (if any) of these is causing the debug info bloat?<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
I worry/suspect that much of the bloat could be in the DILocation chains (line-table), which are generally non-mergeable.<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> a) List of enum types<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> - Not needed in the importing module: change to nullptr<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
Seems reasonable for -flto=thin (incorrect for -flto=full, though).<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> b) List of macros<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> - Not needed in the importing module: change to nullptr<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
We don't have macros by default, right? You need some -gmacros flag I hope? I remember objecting to the massive debug info bloat when they were introduced, but was convinced by an argument along the lines of "it's-never-going-to-be-on-by-default".<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg"></blockquote><div class="gmail_msg"><br class="gmail_msg"></div></span><div class="gmail_msg">Yes, it's off by default, but still good to think about whether we could do something similar for it than other things here.</div><span class="gmail_msg"><div class="gmail_msg"> </div><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> c) List of global variables<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> - Only needed if we have imported the corresponding global variable definition. Since we currently don't import any global variable definitions (we should assert that there aren't any in the import list so this doesn't get stale), we can change this to nullptr.<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
Why does this still exist? Can't we delete this list in all modes, since GlobalValues reference their own DIGlobalVariables directly?<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg"></blockquote></span><div class="gmail_msg"><br class="gmail_msg">Yeah, I forget why - I think there's a reason, though.<br class="gmail_msg"> </div><span class="gmail_msg"><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> d) List of imported entities<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> - For now need to import those that have a DILocalScope (and drop others from the imported entities list on the imported CU). David had some ideas on restructuring the way these are referenced, but for now we conservatively need to keep those that may be from functions, any that are from functions not actually imported will not be emitted into the object file anyway.<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
I'd rather just restructure these if we can.<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg"></blockquote><div class="gmail_msg"><br class="gmail_msg"></div></span><div class="gmail_msg">In the simplest form, it probably means adding a new list to DISubprograms (of all the imported entities in that function) - that way at least the function local imported entities would be dropped naturally when the subprogram was dropped (or not imported).</div><span class="gmail_msg"><div class="gmail_msg"> </div><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> e) List of retained types<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> - Leave as-is but import DICompositeTypes as type declarations. This one David thought would need to be under an option because of lldb's assumption that the DWARF match the Clang AST (I think I am stating that right, correct me if not!).<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
How many of these exist? I expect this list to be almost always empty. If it's not, why not? (Did I fail to push the commit to Clang that stopped adding all C++ types here??) Can we somehow prune the list?<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg"></blockquote><div class="gmail_msg"><br class="gmail_msg"></div></span><div class="gmail_msg">Ah, you're right - I was probably operating under the old model that everything was in retained types. It's good to know/see that's not the case. I wonder what, if anything, we put in retained types these days. (there certainly used to be ways - like cases where a type is used without any variable: "((foo*)void_ptr)->x = 3;" - we could probably help in those cases by pushing a retained types list down onto subprograms too, so the uses were attached to subprograms - but that means potentially a lot of duplication (if a type is 'retained' in many functions))</div></div></div></blockquote><div class="gmail_msg"><br class="gmail_msg"></div></div></div></div><div dir="ltr" class="gmail_msg"><div class="gmail_extra gmail_msg"><div class="gmail_quote gmail_msg"><div class="gmail_msg">Actually, now that I think about it, it isn't just the composite types hanging off the retained types list that are an issue. When I first just tried mapping the retained types list to nullptr on the imported CU, it had a much smaller impact on the sizes. It is the composite types referenced elsewhere that get pulled in anyway that are still causing bloat (I think from the imported functions' DISubprograms via the type lists reached from there).</div></div></div></div><div dir="ltr" class="gmail_msg"><div class="gmail_extra gmail_msg"><div class="gmail_quote gmail_msg"><div class="gmail_msg"><br class="gmail_msg"></div><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="gmail_msg"><div class="gmail_quote gmail_msg"><span class="gmail_msg"><div class="gmail_msg"> </div><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I'm concerned that this is not a particularly fast thing to do.<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
I'm also worried that this would cause major debug info quality degradation on Darwin (defer to Adrian on that...). I'd rather leave it unoptimized unless we're sure we need to do something here.<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg"></blockquote><div class="gmail_msg"><br class="gmail_msg"></div></span><div class="gmail_msg">Turning definitions into declarations like that is incompatible with lldb - it'd need to be turned off (Teresa mentioned putting it under a flag for this reason) for any lldb platform.</div><span class="gmail_msg"><div class="gmail_msg"> </div><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Also, why do you need this for -flto=thin? Why not just nullptr?<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg"></blockquote><div class="gmail_msg"><br class="gmail_msg"></div></span><div class="gmail_msg">Legit point - same as enums, etc, they'll be emitted somewhere, no need to retain them everywhere, even as declarations.<br class="gmail_msg"></div></div></div></blockquote><div class="gmail_msg"><br class="gmail_msg"></div></div></div></div><div dir="ltr" class="gmail_msg"><div class="gmail_extra gmail_msg"><div class="gmail_quote gmail_msg"><div class="gmail_msg">Ok, simple enough to also get this mapped in as nullptr on the imported CU.</div></div></div></div><div dir="ltr" class="gmail_msg"><div class="gmail_extra gmail_msg"><div class="gmail_quote gmail_msg"><div class="gmail_msg"> </div><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="gmail_msg"><div class="gmail_quote gmail_msg"><div class="gmail_msg"><br class="gmail_msg">The proper definition->declaration demotion would have to happen by walking all the debug info looking for types... :/</div></div></div></blockquote><div class="gmail_msg"><br class="gmail_msg"></div></div></div></div><div dir="ltr" class="gmail_msg"><div class="gmail_extra gmail_msg"><div class="gmail_quote gmail_msg"><div class="gmail_msg">Right, that's what I did to get the big benefit - using the DebugInfoFinder.</div></div></div></div></blockquote><div><br></div><div>Ah, OK - yep, that makes more sense. Thanks for clarifying/mentioning!<br><br>- Dave</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="gmail_msg"><div class="gmail_extra gmail_msg"><div class="gmail_quote gmail_msg"><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">Teresa</div></div></div></div><div dir="ltr" class="gmail_msg"><div class="gmail_extra gmail_msg"><div class="gmail_quote gmail_msg"><div class="gmail_msg"> </div><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="gmail_msg"><div class="gmail_quote gmail_msg"><span class="gmail_msg"><div class="gmail_msg"> </div><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> Implementation:<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
Generally, I'd rather see the direction of explicitly building new DICompileUnits and linking in the correct/necessary arguments, like the IRLinker explicitly builds new llvm::Functions. Ideally, we'd only have to explicitly deal with DICompileUnit (one reason that I'm concerned about (e) above), which we can find through !<a href="http://llvm.dbg.cu" rel="noreferrer" class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg" target="_blank">llvm.dbg.cu</a>.<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> a)-d) (enums/macros/global variables/imported entities):<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
><br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> For a-d I have a simple solution in the IRLinker. At the very start of IRLinking in a module, if it is being linked in for function importing, I invoke a function that does the handling (i.e. from the IRLinker constructor).<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
><br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> This routine handles a-c by pre-populating the ValueMap's MDMap entry for those Metadata* (e.g. the RawEnumTypes Metadata*) with nullptr, so metadata mapping automatically makes them nullptr on the imported DICompileUnit.<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
><br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> For d, this routine walks the imported entities on the source CU and builds a SmallVector of those with local scopes. It then invokes replaceImportedEntities on the source CU to replace it with the new list (essentially dropping those with non-local scopes), and again the subsequent metadata mapping just works.<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
><br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> e) (retained types):<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
><br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> For e, changing the imported DICompisiteType to type declarations, I am having some issues. I prototyped this by doing it in the BitcodeReader. I passed in a new flag indicating that we are parsing a module for function importing. Then when parsing a METADATA_COMPOSITE_TYPE, if we are importing a module I change the calls to buildODRType and GET_OR_DISTINCT to instead create a type declaration:<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> - pass in flags Flags|DINode::FlagFwdDecl<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> - pass in nullptr for BaseType, Elements, VtableHolder, TemplateParams<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> - pass in 0 for OffsetInBits<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
><br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> Note that buildODRType will not mutate any existing DICompisiteType for that identifier found in the DITypeMap to a type declaration (FlagFwdDecl) though. This is good since we are sharing the DITypeMap with the original (destination) module, and we don't want to change any type definitions on the existing destination (importing) module to be type declarations when the same type is also in the source module we are about to import.<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
><br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> When cleaning this up, I initially tried mutating the types on the source module at the start of IRLinking of imported modules. I did this by adding a new DICompositeType function to force convert an existing type definition to a type declaration (since the bitcode reader already would have created a type definition when parsing the imported source module, and as I noted above it wasn't allowed to be mutated to a declaration after that point). However, this is wrong since it ended up changing type definitions that were also used by the original destination/importing module to type declarations if they were also used in the source module (and therefore shared a DITypeMap entry). To get the forced mutation to a type decl to work here, I would somehow need to detect when a composite type on the source module was *not* also used by the original destination module. The only way I came up with off the top of my head was to also do a DebugInfoFinder::processModule on the dest module, and subtract the resulting types() from those I found when finding types on the source module I'm about to import, and only force-convert the remaining ones to type decls.<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
><br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
><br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> Interested in feedback on any of the above, and in particular on the cleanest way to create type declarations for imported types.<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
><br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> Thanks!<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> Teresa<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
><br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> --<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
> Teresa Johnson | Software Engineer | <a href="mailto:tejohnson@google.com" class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg" target="_blank">tejohnson@google.com</a> | <a href="tel:(408)%20460-2413" value="+14084602413" class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg" target="_blank">408-460-2413</a><br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
<br class="m_-8475365039461335010m_-7155458074996916294gmail_msg gmail_msg">
</blockquote></span></div></div>
</blockquote></div></div></div><div dir="ltr" class="gmail_msg"><div class="gmail_extra gmail_msg"><br class="gmail_msg"><br clear="all" class="gmail_msg"><div class="gmail_msg"><br class="gmail_msg"></div>-- <br class="gmail_msg"><div class="m_-8475365039461335010gmail_signature gmail_msg" data-smartmail="gmail_signature"><span style="font-family:Times;font-size:medium" class="gmail_msg"><table cellspacing="0" cellpadding="0" class="gmail_msg"><tbody class="gmail_msg"><tr style="color:rgb(85,85,85);font-family:sans-serif;font-size:small" class="gmail_msg"><td nowrap style="border-top-style:solid;border-top-color:rgb(213,15,37);border-top-width:2px" class="gmail_msg">Teresa Johnson |</td><td nowrap style="border-top-style:solid;border-top-color:rgb(51,105,232);border-top-width:2px" class="gmail_msg"> Software Engineer |</td><td nowrap style="border-top-style:solid;border-top-color:rgb(0,153,57);border-top-width:2px" class="gmail_msg"> <a href="mailto:tejohnson@google.com" class="gmail_msg" target="_blank">tejohnson@google.com</a> |</td><td nowrap style="border-top-style:solid;border-top-color:rgb(238,178,17);border-top-width:2px" class="gmail_msg"> <a href="tel:(408)%20460-2413" value="+14084602413" class="gmail_msg" target="_blank">408-460-2413</a></td></tr></tbody></table></span></div>
</div></div></blockquote></div></div>