[Lldb-commits] [PATCH] D54216: [NativePDB] Improve support for reconstructing a clang AST from PDB debug info

Zachary Turner via Phabricator via lldb-commits lldb-commits at lists.llvm.org
Wed Nov 7 11:22:06 PST 2018

zturner created this revision.
zturner added reviewers: aleksandr.urakov, labath, lemo.
Herald added subscribers: erik.pilkington, JDevlieghere, aprantl.

This is an alternative to https://reviews.llvm.org/D54053 which uses a different approach.  The goal of both is the same - to be able to improve the quality of the AST that we reconstruct when parsing the debug info.  https://reviews.llvm.org/D54053 attempts to address this by demangling the unique name of each type, and using the structure of the demangler's AST to try to reconstruct a clang AST.

However, there are some complications with this approach.  The two biggest ones are:

a) The mangling does not always provide enough information to disambiguate between two types, depending on where it occurs in the mangling.  
b) The mangling provides no way to differentiate outer classes from outer namespaces, so in `A::B::C`, we don't know if `A` and `B` are (class, class), (namespace, namespace), or (namespace, class).

b) sounds like it could be an unimportant distinction, but since LLDB works by gradually building up an AST over time that grows as more and more debug info is parsed, you can very quickly end up in a situation where there are ambiguities in your AST.  For example, you may decide that `B` is probably a namespace, so you create a `NamespaceDecl` for it in the AST, and then later someone instatiates a variable of type `A::B` and you have precise debug info telling you it's a class.  This will create two decls at the same scope in the AST hierarchy with the same name, causing ambiguities and these will slowly build up over time leading to instability.

The approach here is based off of the observation that the PDB contains information about nested classes in the parent -> child direction, just not the other way around.  That is to say, if you have code such as: `struct A { struct B {}; };`  Then the debug info record for `A` will tell you that it contains a nested type call `B`, along with an index for the full definition of `B` in the debug info.  The problem we are facing all along is that if someone declares a variable of type `A::B`, they need the reverse mapping, and PDB doesn't offer that.

So, the simple solution employed here is to simply pre-process all types up front and build the reverse mapping.  This gives us perfect information about class hierarchy, and allows us to precisely determine if a part of a scope is a namespace (specifically, it will have no parent in the reverse mapping).

But we can even re-purpose this pre-processing step for other things down the line.  For example, we may wish to find all types name `Foo`, but maybe `Foo` is a template and the instantion is `Foo<int>`.  We could use this pre-processing step to build this kind of hash table.  And many other things as well.

Note that the idea of demangling a type and using the structured demangler AST is not totally abandoned.  For example, if you have a template instantiation named `Foo<int>`, the patch here will simply create a class with the name `Foo<int>`.  In other words, we make no attempt to parse template parameters and create the appropriate instantiations in the AST.

We also do not yet handle scoped classes (i.e. classes that are defined inside the body of a funtion).  But we can handle those later.

Note that I started adding a new kind of test, an ast test.  I even retrofitted existing tests with ast testing functionality.  I think this is a useful testing strategy to ensure we are generating correct ASTs from debug info.



-------------- next part --------------
A non-text attachment was scrubbed...
Name: D54216.172989.patch
Type: text/x-patch
Size: 29288 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/lldb-commits/attachments/20181107/2862e6b1/attachment-0001.bin>

More information about the lldb-commits mailing list