[llvm-dev] [MS] Partial PDB (/DEBUG:FASTLINK) parsing support in LLVM

Zachary Turner via llvm-dev llvm-dev at lists.llvm.org
Thu Jun 8 09:43:16 PDT 2017


I didn't believe you at first that DIA SDK didn't support partial PDBs, so
I went and tried `llvm-pdbdump pretty -types foo.pdb` on a partial PDB and
it caused llvm-pdbdump to crash.  When I looked further, it turns out
IDiaSymbol::findChildren() is returning E_NOTIMPL.  Wow!  I'm a bit
surprised honestly.

I've pushed a fix for this in r304982, but all that does is make
llvm-pdbdump not crash.  It still doesn't display any types.

Luckily llvm-pdbdump has another mode (accessible via the `raw` subcommand)
that can bypass the DIA SDK and show you the underlying structure.  Here's
what I get when I try dumping types of a partial PDB.

D:\src\llvmbuild\ninja>bin\llvm-pdbdump raw -tpi-records cpptest.pdb
Type Info Stream (TPI) {
  TPI Version: 20040203
  Record count: 0
  Records [
    TypeIndexOffsets [
    ]
  ]
}

Umm, ok.  So there's *actually* no types in the PDB.

Let's try symbols.

D:\src\llvmbuild\ninja>bin\llvm-pdbdump raw -module-syms cpptest.pdb
DBI Stream {
  # snip
  Modules [
    {
      Name: test2.obj
      # snip
      Symbols [
        {
          UnknownSym {
            Kind: 0x1167
            Length: 52
          }
        }
        {
          UnknownSym {
            Kind: 0x1167
            Length: 64
          }
        }
        {
          UnknownSym {
            Kind: 0x1167
            Length: 60
          }
        }
  # thousands of similar lines snipped.

So this is a little bit more interesting.  Let's see what these records
look like:

D:\src\llvmbuild\ninja>bin\llvm-pdbdump raw -module-syms -sym-record-bytes
cpptest.pdb
DBI Stream {
  # snip
  Modules [
    {
      Name: test2.obj
      # snip
      Symbols [
        {
          UnknownSym {
            Kind: 0x1167
            Length: 52
          }
          Bytes (
            0000: 30140000 04005F5F 76635F61 74747269  |0.....__vc_attri|
            0010: 62757465 733A3A65 76656E74 5F736F75  |butes::event_sou|
            0020: 72636541 74747269 62757465 00000000  |rceAttribute....|
          )
        }
        {
          UnknownSym {
            Kind: 0x1167
            Length: 64
          }
          Bytes (
            0000: 29140000 04005F5F 76635F61 74747269  |).....__vc_attri|
            0010: 62757465 733A3A65 76656E74 5F736F75  |butes::event_sou|
            0020: 72636541 74747269 62757465 3A3A6F70  |rceAttribute::op|
            0030: 74696D69 7A655F65 00000000           |timize_e....|
          )
        }
        {
          UnknownSym {
            Kind: 0x1167
            Length: 60
          }
          Bytes (
            0000: 27140000 04005F5F 76635F61 74747269  |'.....__vc_attri|
            0010: 62757465 733A3A65 76656E74 5F736F75  |butes::event_sou|
            0020: 72636541 74747269 62757465 3A3A7479  |rceAttribute::ty|
            0030: 70655F65 00000000                    |pe_e....|
          )
        }
        {
          UnknownSym {
            Kind: 0x1167
            Length: 68
          }
          Bytes (
            0000: 0C140000 04005F5F 76635F61 74747269  |......__vc_attri|
            0010: 62757465 733A3A68 656C7065 725F6174  |butes::helper_at|
            0020: 74726962 75746573 3A3A7631 5F616C74  |tributes::v1_alt|
            0030: 74797065 41747472 69627574 65000000  |typeAttribute...|
          )
        }

So, this symbol record with kind 0x1167 is pretty interesting, and clearly
related to /debug:fastlink.  Its format can be deduced as something like
this:

struct DebugFastLinkRecord {
  char Unknown[6];
  char Name[0]; // null terminated string
  char Padding[0]; // pad to 4 bytes
};

What those first 6 bytes are I can't tell you.

Let's see what else we can find.  another source of interesting debug info
comes from what I refer to as "debug subsections".  In an object file,
every .debug$S section is basically just a big list of these.  In a PDB
file though, the debug subsections appear embedded inside of a each
module's debug stream.  Which is similar to a .debug$S section, but with
some additional PDB-specific stuff.  You can find llvm-pdbdump's code for
parsing this in ModuleDebugStream.cpp

Anyway, the part we're interested can be dumped using llvm-pdbdump raw
-subsections=unknown.  I say unknown because we're looking for stuff that
is unique to /debug:fastlink PDBs, so presumably any /debug:fastlink
specific data would be something we don't know about / have never seen
before.  (Note that this command line option hasn't made it upstream yet,
it's still in review.  But expect it today or tomorrow if all goes well).

So we'll try this:

bin\llvm-pdbdump raw -subsections=unknown cpptest.pdb
DBI Stream {
  # snip
  Modules [
    {
      Name: test2.obj
      # snip
      Subsections [
        Unknown {
          Kind: 0xFD
          Data (
            0000: 00000000 00000000 00000000 00000000  |................|
            0010: 00000000 00000000 00000000 00000000  |................|
            0020: 00000000 00000000 00000000 B0240100  |.............$..|
            0030: 00000000 00000000 00000000 00000000  |................|
            0040: 00000000 B0240100 90270100 D0270100  |.....$...'...'..|
            0050: 90990100 00000000 00000000 90990100  |................|
            0060: A49C0100 00000000 00000000 A49C0100  |................|
          )
        }
      ]
    }

Neat!  What is this thing?  0xFD is 253, and looking that up in our
DebugSubsectionKind
enumeration
<https://github.com/llvm-mirror/llvm/blob/master/include/llvm/DebugInfo/CodeView/CodeView.h#L317>
shows
that this is a CoffSymbolRVA subsection.

The format of that subsection can very likely be understood by reading the
code in the Microsoft repo, but I haven't investigated it yet.

Hopefully this is a good starting point.  llvm-pdbdump is a pretty useful
tool for investigating these types of issues, so let me know if you try it
out and have suggestions for how to improve it.

As mentioned, some of the commands I demonstrated above are still not
upstream yet, but I'll try to get it in this week.

On Thu, Jun 8, 2017 at 5:07 AM Will Wilson <will at indefiant.com> wrote:

> Hi Zach (or anyone else who may have a clue),
>
> I'm currently investigating making use of LLVM for PDB parsing for with a
> view to supporting partial PDBs as produced by /DEBUG:FASTLINK as the VS
> DIA SDK hasn't been updated to handle them. I know this is probably low on
> your priority list but since /DEBUG:FASTLINK is now the implied default for
> VS2017 I figure it's a good time to take a look at it.
>
> Unfortunately I'm finding very little information on the internal
> structure used by partial PDBs. It seems
> https://github.com/Microsoft/microsoft-pdb doesn't offer much either,
> unless I'm missing something...
>
> So, two questions: Are you planning to try and support partial PDBs? And
> do you have any good references for their layout?
>
> Many thanks,
> Will.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170608/e5dc8a8f/attachment.html>


More information about the llvm-dev mailing list