[PATCH] D33785: [CodeView] Fix alignment / padding when writing symbol records

Zachary Turner via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jun 1 09:55:56 PDT 2017


zturner created this revision.

The rabbit hole runs deep on this one.  Apparently when writing object files, symbol records are 1-byte aligned, but when writing PDB files, symbols are 4-byte aligned.  I confirmed this through the following sequence of steps:

1. Starting with `test/tools/llvm-readobj/codeview-types.test`, make it possible to build the code by deleting the `#error` and adding definitions of the various functions and adding a `main` function so that it can link into a valid executable.
2. Use the instructions in the aforementioned file to regenerate the object file on a modern version of MSVC.
3. Use `llvm-readobj -codeview -codeview-subsection-bytes codeview-types.obj` to dump the records.  Save this output somewhere
4. Link the object file using `link /debug /out:codeview-types.exe codeview-types.obj`, which should produce `codeview-types.pdb`
5. Dump the pdb using `llvm-pdbdump raw -dbi-module-syms -sym-record-bytes codeview-types.pdb` and save the output somewhere.

When you look at the first file, you can find these two records (among plenty of others):

  ObjectName {
    Signature: 0x0
    ObjectName: D:\src\llvmbuild\ninja\codeview-types-cl.obj
  }
  SymData (
    0000: 00000000 443A5C73 72635C6C 6C766D62  |....D:\src\llvmb|
    0010: 75696C64 5C6E696E 6A615C63 6F646576  |uild\ninja\codev|
    0020: 6965772D 74797065 732D636C 2E6F626A  |iew-types-cl.obj|
    0030: 00                                   |.|
  )
  FrameProc {
    TotalFrameBytes: 0x28
    PaddingFrameBytes: 0x0
    OffsetToPadding: 0x0
    BytesOfCalleeSavedRegisters: 0x0
    OffsetOfExceptionHandler: 0x0
    SectionIdOfExceptionHandler: 0x0
    Flags [ (0x128200)
      AsynchronousExceptionHandling (0x200)
      OptimizedForSpeed (0x100000)
    ]
  }
  SymData (
    0000: 28000000 00000000 00000000 00000000  |(...............|
    0010: 00000000 00000082 1200               |..........|
  )

Note that the first record consists of 49 bytes and the second consists of 26 bytes, neither of which is a multiple of 4.  If you track down the same records in the dumped PDB file, they look like this:

  {
    ObjectName {
      Signature: 0x0
      ObjectName: D:\src\llvmbuild\ninja\codeview-types-cl.obj
    }
    Bytes (
      0000: 00000000 443A5C73 72635C6C 6C766D62  |....D:\src\llvmb|
      0010: 75696C64 5C6E696E 6A615C63 6F646576  |uild\ninja\codev|
      0020: 6965772D 74797065 732D636C 2E6F626A  |iew-types-cl.obj|
      0030: 00000000                             |....|
    )
  }
  {
    FrameProc {
      TotalFrameBytes: 0x28
      PaddingFrameBytes: 0x0
      OffsetToPadding: 0x0
      BytesOfCalleeSavedRegisters: 0x0
      OffsetOfExceptionHandler: 0x0
      SectionIdOfExceptionHandler: 0x0
      Flags [ (0x128200)
        AsynchronousExceptionHandling (0x200)
        OptimizedForSpeed (0x100000)
      ]
    }
    Bytes (
      0000: 28000000 00000000 00000000 00000000  |(...............|
      0010: 00000000 00000082 12000000           |............|
    )

and here we can see that both records have been zero-padded to 4 bytes but are otherwise identical.

Since we use the same code for reading / writing symbols to / from object files / PDB files, this is kind of annoying.  To address this, I added an alignment field to the low level `SymbolSerializer` and `SymbolDeserializer` classes and propagate this value all the way up to the top-level client, in this case `llvm-pdbdump`, `llvm-readobj`, `lld`, etc and let them specify the alignment based on whether they're handling PDB files or object files.

After this change, we can round-trip symbol records from PDB -> YAML -> PDB, and I modified the existing round-tripping test to verify that this does in fact work now..  There are still about 5 bytes that differ between the original stream's symbol records and the new stream's symbol records, but at least it doesn't crash and the records look the same when re dumped.  We can investigate the remaining byte differences in a followup patch.

Another interesting (and annoying) thing I learned is that either Microsoft compiles with `#pragma pack(2)` or `#pragma pack(1)` or the symbol record definitions in `cvrec.h` aren't authoritative.  I went off on a wild goose chase for a couple of hours after noticing that `FRAMEPROCSYM` contains a `uint16` followed by a `uint32`, and so there are 2 bytes of padding in there that we aren't accounting for.  I confirmed this with `cl /d1reportSingleClassRecordLayoutFRAMEPROCSYM` against their structure definition.  But this broke many of the existing tests in `llvm-readobj`.  It turns out this 32-bit enum is *actually* written unaligned in the official implementation, so the code was originally correct.  Also confirmed by running `cvdump -s` on an object file, and realizing that the padding to 4 bytes comes *after* the enum, not before.

Ugh.  I want my night back.


https://reviews.llvm.org/D33785

Files:
  lld/COFF/PDB.cpp
  llvm/include/llvm/DebugInfo/CodeView/CodeViewRecordIO.h
  llvm/include/llvm/DebugInfo/CodeView/SymbolDeserializer.h
  llvm/include/llvm/DebugInfo/CodeView/SymbolDumper.h
  llvm/include/llvm/DebugInfo/CodeView/SymbolRecordMapping.h
  llvm/include/llvm/DebugInfo/CodeView/SymbolSerializer.h
  llvm/include/llvm/ObjectYAML/CodeViewYAMLSymbols.h
  llvm/lib/DebugInfo/CodeView/CodeViewRecordIO.cpp
  llvm/lib/DebugInfo/CodeView/SymbolDumper.cpp
  llvm/lib/DebugInfo/CodeView/SymbolRecordMapping.cpp
  llvm/lib/DebugInfo/CodeView/SymbolSerializer.cpp
  llvm/lib/DebugInfo/PDB/Native/DbiModuleDescriptorBuilder.cpp
  llvm/lib/ObjectYAML/CodeViewYAMLSymbols.cpp
  llvm/test/DebugInfo/PDB/pdbdump-write.test
  llvm/tools/llvm-pdbdump/LLVMOutputStyle.cpp
  llvm/tools/llvm-pdbdump/llvm-pdbdump.cpp
  llvm/tools/llvm-readobj/COFFDumper.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D33785.101040.patch
Type: text/x-patch
Size: 16920 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170601/55f06e39/attachment-0001.bin>


More information about the llvm-commits mailing list