<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Mar 2, 2019 at 2:56 PM Adrian Prantl <<a href="mailto:aprantl@apple.com">aprantl@apple.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space"><div><br><blockquote type="cite"><div>On Feb 25, 2019, at 10:21 AM, Zachary Turner via lldb-dev <<a href="mailto:lldb-dev@lists.llvm.org" target="_blank">lldb-dev@lists.llvm.org</a>> wrote:</div><br class="m_-5097363829552112159Apple-interchange-newline"><div><div dir="ltr">Hi all,<div><br></div><div>We've got some internal efforts in progress, and one of those would benefit from debug info parsing being out of process (independently of whether or not the rest of LLDB is out of process).</div><div><br></div><div>There's a couple of advantages to this, which I'll enumerate here:</div><div><ul><li>It improves one source of instability in LLDB which has been known to be problematic -- specifically, that debug info can be bad and handling this can often be difficult and bring down the entire debug session.  While other efforts have been made to address stability by moving things out of process, they have not been upstreamed, and even if they had I think we would still want this anyway, for reasons that follow.</li></ul></div></div></div></blockquote></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><div>Where do you draw the line between debug info and the in-process part of LLDB? I'm asking because I have never seen the mechanical parsing of DWARF to be a source of instability; most crashes in LLDB are when reconstructing Clang ASTs because we're breaking some subtle and badly enforced invariants in Clang's Sema. Perhaps parsing PDBs is less stable? If you do mean at the AST level then I agree with the sentiment that it is a common source of crashes, but I don't see a good way of moving that component out of process. Serializing ASTs or types in general is a hard problem, and I'd find the idea of inventing yet another serialization format for types that we would have to develop, test, and maintain quite scary.</div></div></div></blockquote><div>If anything I think parsing PDBs is more stable.  There is close to zero flexibility in how types and symbols can be represented in PDB / CodeView, and on top of that, there are very few producers.  Combined, this means we can assume almost everything about the structure of the records.  </div><div><br></div><div>Yes the crashes *happen* at the AST level (most of them anyway, not all - there are definitely examples of crashing in the actual parsing code), but the fact that there is so much flexibility in how records can be specified in DWARF exacerbates the problem by complicating the parsing code, which is then not well tested because of all the different code paths.  </div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space"><div><blockquote type="cite"><div><div dir="ltr"><div><ul><li>It becomes testable as an independent component, because you can just send requests to it and dump the results and see if they make sense.  Currently there is almost zero test coverage of this aspect of LLDB apart from what you can get after going through many levels of indirection via spinning up a full debug session and doing things that indirectly result in symbol queries.</li></ul></div></div></div></blockquote></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><div>You are right that the type system debug info ingestion and AST reconstruction is primarily tested end-to-end.</div></div></div></blockquote><div>Do you consider this something worth addressing by testing the debug info ingestion in isolation?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space"><div><br><blockquote type="cite"><div><div dir="ltr"><div><div>The big win here, at least from my point of view, is the second one.  Traditional symbol servers operate by copying entire symbol files (DSYM, DWP, PDB) from some machine to the debugger host.  These can be very large -- we've seen 12+ GB in some cases -- which ranges from "slow bandwidth hog" to "complete non-starter" depending on the debugger host and network. </div></div></div></div></blockquote><div><br></div></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><div>12 GB sounds suspiciously large. Do you know how this breaks down between line table, types, and debug locations? If it's types, are you deduplicating them? For comparison, the debug info of LLDB (which contains two compilers and a debugger) compresses to under 500MB, but perhaps the binaries you are working with are really just that much larger.</div></div></div></blockquote><div>They really are that large.  </div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space"><div><br><blockquote type="cite"><div><div dir="ltr"><div><div> In this kind of scenario, one could theoretically run the debug info process on the same NAS, cloud, or whatever as the symbol server.  Then, rather than copying over an entire symbol file, it responds only to the query you issued -- if you asked for a type, it just returns a packet describing the type you requested.</div></div><div><br></div><div>The API itself would be stateless (so that you could make queries for multiple targets in any order) as well as asynchronous (so that responses might arrive out of order).  Blocking could be implemented in LLDB, but having the server be asynchronous means multiple clients could connect to the same server instance.  This raises interesting possibilities.  For example, one can imagine thousands of developers connecting to an internal symbol server on the network and being able to debug remote processes or core dumps over slow network connections or on machines with very little storage (e.g. chromebooks).</div></div></div></blockquote><div><br></div></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><div>You *could* just run LLDB remotely ;-)</div><div><br></div>That sounds all cool, but in my opinion you are leaving out the really important part: what is the abstraction level of the API going to be?</div><div><br></div><div>To be blunt, I'm against inventing yet another serialization format for *types* not just because of the considerable engineering effort it will take to get this right, but also because of the maintenance burden it would impose. We already have to support loading types from DWARF, PDB, Clang modules, the Objective-C runtime, Swift modules, and probably more sources, all of these operate to some degree at different levels of abstraction. Adding another source or abstraction layer into the mix needs to be really well thought out and justified.</div></div></blockquote><div>Let's ignore whether the format can be serialized and instead focus on the abstraction level of the API.  Personally, I think the format should be higher level than DWARF DIEs but lower level than an AST.  By making it higher level than DWARF DIEs, we could use the same abstraction to represent PDB types and symbols as well, and by making it lower level than ASTs, we could support non-clang TypeSystems.  This way, you have one API which gives you "something" that you can trust and works with any underlying debug info format, and one codepath that builds the AST from it, regardless of which Debug info format and programming language it describes.</div><div><br></div><div>In a way, this is like separating the DWARFASTParserClang / SymbolFileDWARF and PDBASTBuilder / SymbolFileNativePDB, and instead have some library called DebugInfoParser, and a single ASTParser class which says DIParser->ParseTypes() and then builds an AST from it without knowing what format it orignated from.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space"><div></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><br><blockquote type="cite"><div><div dir="ltr"><div>On the LLDB side, all of this is hidden behind the SymbolFile interface, so most of LLDB doesn't have to change at all.   While this is in development, we could have SymbolFileRemote and keep the existing local codepath the default, until such time that it's robust and complete enough that we can switch the default.<br></div><div><br></div></div></div></blockquote></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><div>The SymbolFile interface ultimately vends compiler types so now I'm really curious what kind of data you are planning to send over the wire.</div></div></div></blockquote><div><br></div><div>So again, let's ignore "the wire" for the sake of this discussion.  SymbolFile does vend compiler types, but that doesn't mean we can't have a single "master" SymbolFile implementation which a) calls into DebugInfoParser (which need not be out of process), and then b) uses the result of these library calls to construct an AST.</div></div></div>