[Lldb-commits] [PATCH] D52461: [PDB] Introduce `MSVCUndecoratedNameParser`

Tue Oct 30 20:39:09 PDT 2018

zturner added a comment.

In https://reviews.llvm.org/D52461#1280527, @aleksandr.urakov wrote:

> Update the diff according to the discussion, making it possible to parse MSVC demangled names by `CPlusPlusLanguage`. The old PDB plugin still uses `MSVCUndecoratedNameParser` directly because:
>
> - we are sure that the name in PDB is an MSVC name;
> - it has a more convenient interface, especially for restoring namespaces from the parsed name.

So I had an interesting solution to this while working on the native pdb plugin.  it is impossible to use it with the old pdb plugin, but given that it works flawlessly for the native pdb plugin, depending on how urgent your need is, maybe you can just put off working on this until you're ready to move over to the native pdb plugin?

Basically the idea is that the raw PDB contains mangled type names for every type.  You can see this by dumping types using `llvm-pdbutil`, as follows (I just picked a random one from my build directory).

  D:\src\llvmbuild\ninja-x64>bin\llvm-pdbutil.exe dump -types bin\sancov.pdb | grep -A 2 LF_STRUCT | more
      0x1001 | LF_STRUCTURE [size = 88] ``anonymous-namespace'::RawCoverage`
               unique name: `.?AURawCoverage@?A0xa74cdb40@@`
               vtable: <no type>, base list: <no type>, field list: <no type>
  --
      0x100A | LF_STRUCTURE [size = 212] `std::default_delete<std::set<unsigned __int64,std::less<unsigned __int64>,std::allocator<unsigned __int64> > >`
               unique name: `.?AU?$default_delete at V?$set at _KU?$less at _K@std@@V?$allocator at _K@2@@std@@@std@@`
               vtable: <no type>, base list: <no type>, field list: <no type>
  --
      0x102B | LF_STRUCTURE [size = 88] ``anonymous-namespace'::FileHeader`
               unique name: `.?AUFileHeader@?A0xa74cdb40@@`
               vtable: <no type>, base list: <no type>, field list: <no type>
  --
      0x1031 | LF_STRUCTURE [size = 112] `std::default_delete<llvm::MemoryBuffer>`
               unique name: `.?AU?$default_delete at VMemoryBuffer@llvm@@@std@@`
               vtable: <no type>, base list: <no type>, field list: <no type>
  --
      0x1081 | LF_STRUCTURE [size = 304] `llvm::AlignedCharArrayUnion<std::unique_ptr<llvm::MemoryBuffer,std::default_delete<llvm::MemoryBuffer> >,char,char,char,char,char,char,char,char,char>`
               unique name: `.?AU?$AlignedCharArrayUnion at V?$unique_ptr at VMemoryBuffer@llvm@@U?$default_delete at VMemoryBuffer@llvm@@@std@@@std@@DDDDDDDDD at llvm@@`
               vtable: <no type>, base list: <no type>, field list: <no type>
  --
      0x1082 | LF_STRUCTURE [size = 176] `llvm::AlignedCharArrayUnion<std::error_code,char,char,char,char,char,char,char,char,char>`
               unique name: `.?AU?$AlignedCharArrayUnion at Verror_code@std@@DDDDDDDDD at llvm@@`
               vtable: <no type>, base list: <no type>, field list: <no type>

So the interesting thing here is this "unique name" field.  This is not possible to access via DIA SDK but it gives us complete rich information about the type that is otherwise impossible.  We don't even have to guess, because we can just demangle the name.  And coincidentally, I recently just finished writing an Microsoft ABI demangler which is now in LLVM.  :)   This `.?AU` syntax is non-standard, but it was easy for me to figure out, and I hacked up our demangle library to support this prefix (it's not checked in yet).  And basically everything that comes after it exactly matches a mangled type.

So, just to give an example.  Instead of teaching `CPlusPlusNameParser` to handle ``anonymous namespace'::RawCoverage`, we simply demangle `.?AURawCoverage@?A0xa74cdb40@@`, and we get back a vector of 2 strings which are ``anonymous namespace'` and `RawCoverage`.  But instead of just that, there are so many other benefits.  Since PDB doesn't contain rich information about template parameters, all we could do until now is just say create an entry in the AST that says "there's a type with this enormously long name that contains angle brackets and other junk".  But with this technique, we could actually create legitimate template decls in the AST the way it's supposed to be.

There is obviously a lot of complexity in doing it here, but I think long term it will be a richer experience if we parse the mangled name than if we parse the demangled name.  But it's only possible with the native plugin.

What do you think?

https://reviews.llvm.org/D52461