<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p
{mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
span.EmailStyle18
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri","sans-serif";}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Thanks a lot for the numbers! That certainly helps, even with a small sample, was not at all clear to me how to get this data.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">--paulr<o:p></o:p></span></p>
<p class="MsoNormal"><a name="_MailEndCompose"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></a></p>
<div style="border:none;border-left:solid blue 1.5pt;padding:0in 0in 0in 4.0pt">
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> David Blaikie [mailto:dblaikie@gmail.com]
<br>
<b>Sent:</b> Thursday, March 31, 2016 9:52 PM<br>
<b>To:</b> Eric Christopher<br>
<b>Cc:</b> Robinson, Paul; Clang Dev; llvm-dev<br>
<b>Subject:</b> Re: [cfe-dev] RFC: Up front type information generation in clang and llvm<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p><br>
On Mar 31, 2016 7:11 PM, "David Blaikie" <<a href="mailto:dblaikie@gmail.com">dblaikie@gmail.com</a>> wrote:<br>
><br>
><br>
><br>
> On Tue, Mar 29, 2016 at 11:50 PM, Eric Christopher via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a>> wrote:<br>
>><br>
>><br>
>><br>
>> On Tue, Mar 29, 2016 at 11:20 PM Robinson, Paul <<a href="mailto:Paul_Robinson@playstation.sony.com">Paul_Robinson@playstation.sony.com</a>> wrote:<br>
>>><br>
>>> Skipping a serialization and doing something clever about LTO uniquing sounds awesome. I'm guessing you achieve this by extracting types out of DI metadata and packaging them as lumps-o-DWARF that the back-end can then paste together? Reading between
the lines a bit here.<br>
>><br>
>><br>
>> Pretty much, yes.<br>
>> <br>
>>><br>
>>> Can you share data about how much "pure" types dominate the size of debug info? Or at least the current metadata scheme? (Channeling Sean Silva here: show me the data!) Does this hold for C as well as C++?<br>
>><br>
>> They're huge. It's ridiculous. Take a look at the size of the metadata and then the size of the stuff we put in there versus dwarf.<br>
><br>
><br>
> Because numbers are nice to have, I modified Clang to generate every type as 'int' (patch attached - I may've screwed some things up) & then compiled llvm-tblgen's object files with -flto (I would've used all of clang, but I don't have the lto plugin setup,
so I couldn't get past tblgen)<br>
><br>
> Without debug info: 77 MB of bitcode files<br>
> With debug info: 24 MB<o:p></o:p></p>
<p>Oh, and I got these ^ numbers jumbled up. 77 with, 24 without.<o:p></o:p></p>
<p>> With debug info, but no types: 46 MB<br>
><br>
> so... 59% is pure type descriptions (these are the pure ones, the same things we put in type units - I didn't even remove the injected declarations (so if you compile example programs with this - you'll find that the DW_TAG_base_type for "int" has a child
for every member function declaration that's defined (even used inline functions) in this translation unit) for this particular test, at least. Clang would be a larger/more representative sample.<br>
><br>
> I confirmed that both with and without types, there were the same number (48542) of subprogram definitions and without types there were no instances of DICompositeType (both of these were confirmed with xargs/llvm-dis/grep, nothing fancy)<br>
><br>
><br>
> <br>
>><br>
>><br>
>> And yes, it also trivially holds for C.<br>
>> <br>
>>><br>
>>> Not much discussion of data objects and code objects (other than concrete subprograms), is that because they basically aren't changing? Still defined in the metadata and still managed/emitted by the back-end?<br>
>><br>
>><br>
>> Yep. A way of looking at it is more that it is related to things in the IR and so needs IR to represent it.<br>
>> <br>
>>><br>
>>> Please say something about types (which you're thinking of as a front-end thing) defined within scopes (which it looks like you're thinking of as a back-end thing). Not seeing how to get the scoping right.<br>
>>><br>
>>> <br>
>><br>
>><br>
>> Basic idea is non-defining declarations holding types and be the abstract origin for the concrete function? Honestly, I wish they were type unitable at the moment, but that might be something to look into. The current plan at least. This will make some debug
info a little bit larger, but only for things like nested types where we need to throw an extra declaration (i.e. the same sorts of places that type units make things larger).<br>
>><br>
>> At any rate, the first thing is to get the APIs split anyhow.<br>
>><br>
>> -eric<br>
>> <br>
>>><br>
>>> Thanks!<br>
>>><br>
>>> --paulr<br>
>>><br>
>>> <br>
>>><br>
>>> From: cfe-dev [mailto:<a href="mailto:cfe-dev-bounces@lists.llvm.org">cfe-dev-bounces@lists.llvm.org</a>] On Behalf Of Eric Christopher via cfe-dev<br>
>>> Sent: Tuesday, March 29, 2016 6:01 PM<br>
>>> To: Clang Dev; llvm-dev<br>
>>> Subject: [cfe-dev] RFC: Up front type information generation in clang and llvm<br>
>>><br>
>>> <br>
>>><br>
>>> Hi All,<br>
>>><br>
>>> <br>
>>><br>
>>> This is something that's been talked about for some time and it's probably time to propose it.<br>
>>><br>
>>> <br>
>>><br>
>>> The "We" in this document is everyone on the cc line plus me.<br>
>>><br>
>>> <br>
>>><br>
>>> Please go ahead and take a look.<br>
>>><br>
>>> <br>
>>><br>
>>> Thanks!<br>
>>><br>
>>> <br>
>>><br>
>>> -eric<br>
>>><br>
>>> <br>
>>><br>
>>> <br>
>>><br>
>>> Objective (and TL;DR)<br>
>>><br>
>>> =================<br>
>>><br>
>>> <br>
>>><br>
>>> Migrate debug type information generation from the backends to the front end.<br>
>>><br>
>>> <br>
>>><br>
>>> This will enable:<br>
>>><br>
>>> 1. Separation of concerns and maintainability: LLVM shouldn’t have to know about C preprocessor macros, Obj-C properties, or extensive details about debug information binary formats.<br>
>>><br>
>>> 2. Performance: Skipping a serialization should speed up normal compilations.<br>
>>><br>
>>> 3. Memory usage: The DI metadata structures are smaller than they were, but are still fairly large and pointer heavy.<br>
>>><br>
>>> <br>
>>><br>
>>> Motivation<br>
>>><br>
>>> ========<br>
>>><br>
>>> <br>
>>><br>
>>> Currently, types in LLVM debug info are described by the DIType class hierarchy. This hierarchy evolved organically from a more flexible sea-of-nodes representation into what it is today - a large, only somewhat format neutral representation of debug types.
Making this more format neutral will only increase the memory use - and for no reason as type information is static (or nearly so). Debug formats already have a memory efficient serialization, their own binary format so we should support a front end emitting
type information with sufficient representation to allow the backend to emit debug information based on the more normal IR features: functions, scopes, variables, etc.<br>
>>><br>
>>> <br>
>>><br>
>>> Scope/Impact<br>
>>><br>
>>> ===========<br>
>>><br>
>>> <br>
>>><br>
>>> This is going to involve large scale changes across both LLVM and clang. This will also affect any out-of-tree front ends, however, we expect the impact to be on the order of a large API change rather than needing massive infrastructure changes.<br>
>>><br>
>>> <br>
>>><br>
>>> Related work<br>
>>><br>
>>> ==========<br>
>>><br>
>>> <br>
>>><br>
>>> This is related to the efforts to support CodeView in LLVM and clang as well as efforts to reduce overall memory consumption when compiling with debug information enabled; in particular efforts to prune LTO memory usage.<br>
>>><br>
>>> <br>
>>><br>
>>> <br>
>>><br>
>>> Concerns<br>
>>><br>
>>> ========<br>
>>><br>
>>> <br>
>>><br>
>>> <br>
>>><br>
>>> We need a good story for transitioning all the debug info testcases in the backend without giving up coverage and/or readability. David believes he has a plan here.<br>
>>><br>
>>> <br>
>>><br>
>>> Proposal<br>
>>><br>
>>> =======<br>
>>><br>
>>> <br>
>>><br>
>>> Short version<br>
>>><br>
>>> -----------------<br>
>>><br>
>>> <br>
>>><br>
>>> 1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line Table.<br>
>>><br>
>>> 2. Split the clang CGDebugInfo API into Types and Line Table to match.<br>
>>><br>
>>> 3. Add a LLVM DWARF emission library similar to the existing CodeView one.<br>
>>><br>
>>> 4. Migrate the Types API into a clang internal API taking clang AST structures and use the LLVM binary emission libraries to produce type information.<br>
>>><br>
>>> 5. Remove the old binary emission out of LLVM.<br>
>>><br>
>>> <br>
>>><br>
>>> <br>
>>><br>
>>> Questions/Thoughts/Elaboration<br>
>>><br>
>>> -------------------------------------------<br>
>>><br>
>>> <br>
>>><br>
>>> Splitting the DIBuilder API<br>
>>><br>
>>> ~~~~~~~~~~~~~~~~~~~~<br>
>>><br>
>>> Will DISubprogram be part of both?<br>
>>><br>
>>> * We should split it in two: Full declarations with type and a slimmed down version with an abstract origin.<br>
>>><br>
>>> <br>
>>><br>
>>> How will we reference types in the DWARF blob?<br>
>>><br>
>>> * ODR types can be referenced by name<br>
>>><br>
>>> * Non-odr types by full DWARF hash<br>
>>><br>
>>> * Each type can be a pair(tuple) of identifier (DITypeRef today) and blob.<br>
>>><br>
>>> * For < DWARF4 we can emit each type as a unit, but not a DWARF Type Unit and use references and module relocations for the offsets. (See below)<br>
>>><br>
>>> <br>
>>><br>
>>> How will we handle references in DWARF2 or global relocations for non-type template parameters?<br>
>>><br>
>>> * We can use a “relocation” metadata as part of the format.<br>
>>><br>
>>> * Representable as a tuple that has the DIType and the offset within the DIBlob as where to write the final relocation/offset for the reference at emission time.<br>
>>><br>
>>> <br>
>>><br>
>>> Why break up the types at all?<br>
>>><br>
>>> * To enable non-debug format aware linking and type uniquing for LTO that won’t be huge in size. We break up the types so we don’t need to parse debug information to link two modules together efficiently.<br>
>>><br>
>>> <br>
>>><br>
>>> Any other concerns there?<br>
>>><br>
>>> * Debug information without type units might be slightly larger in this scheme due to parents being duplicated (declarations and abstract origin, not full parents). It may be possible to extend dsymutil/etc to merge all siblings into a common parent.
Open question for better ways to solve this.<br>
>>><br>
>>> <br>
>>><br>
>>> How should we handle DWARF5/Apple Accelerator Tables?<br>
>>><br>
>>> * Thoughts:<br>
>>><br>
>>> * We can parse the dwarf in the back end and generate them.<br>
>>><br>
>>> * We can emit in the front end for the base case of non-LTO (with help from the backend for relocation aspects).<br>
>>><br>
>>> * We can use dsymutil on LTO debug information to generate them.<br>
>>><br>
>>> <br>
>>><br>
>>> Why isn’t this a more detailed spec?<br>
>>><br>
>>> * Mostly because we’ve thought about the issues, but we can’t plan for everything during implementation.<br>
>>><br>
>>> <br>
>>><br>
>>> <br>
>>><br>
>>> Future work<br>
>>><br>
>>> ----------------<br>
>>><br>
>>> <br>
>>><br>
>>> Not contained as part of this, but an obvious future direction is that the Module linker could grow support for debug aware linking. Then we can have all of the type information for a single translation unit in a single blob and use the debug aware linking
to handle merging types.<br>
>><br>
>><br>
>> _______________________________________________<br>
>> cfe-dev mailing list<br>
>> <a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a><br>
>> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>
>><br>
><o:p></o:p></p>
</div>
</div>
</body>
</html>