[llvm] r286491 - [PDB] Begin adding documentation for the PDB file format.
Zachary Turner via llvm-commits
llvm-commits at lists.llvm.org
Thu Nov 10 11:24:21 PST 2016
Author: zturner
Date: Thu Nov 10 13:24:21 2016
New Revision: 286491
URL: http://llvm.org/viewvc/llvm-project?rev=286491&view=rev
Log:
[PDB] Begin adding documentation for the PDB file format.
Differential Revision: https://reviews.llvm.org/D26374
Added:
llvm/trunk/docs/PDB/
llvm/trunk/docs/PDB/DbiStream.rst
llvm/trunk/docs/PDB/GlobalStream.rst
llvm/trunk/docs/PDB/HashStream.rst
llvm/trunk/docs/PDB/ModiStream.rst
llvm/trunk/docs/PDB/MsfFile.rst
llvm/trunk/docs/PDB/PdbStream.rst
llvm/trunk/docs/PDB/PublicStream.rst
llvm/trunk/docs/PDB/TpiStream.rst
llvm/trunk/docs/PDB/index.rst
Modified:
llvm/trunk/docs/index.rst
Added: llvm/trunk/docs/PDB/DbiStream.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/PDB/DbiStream.rst?rev=286491&view=auto
==============================================================================
--- llvm/trunk/docs/PDB/DbiStream.rst (added)
+++ llvm/trunk/docs/PDB/DbiStream.rst Thu Nov 10 13:24:21 2016
@@ -0,0 +1,3 @@
+=====================================
+The PDB DBI (Debug Info) Stream
+=====================================
Added: llvm/trunk/docs/PDB/GlobalStream.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/PDB/GlobalStream.rst?rev=286491&view=auto
==============================================================================
--- llvm/trunk/docs/PDB/GlobalStream.rst (added)
+++ llvm/trunk/docs/PDB/GlobalStream.rst Thu Nov 10 13:24:21 2016
@@ -0,0 +1,3 @@
+=====================================
+The PDB Global Symbol Stream
+=====================================
Added: llvm/trunk/docs/PDB/HashStream.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/PDB/HashStream.rst?rev=286491&view=auto
==============================================================================
--- llvm/trunk/docs/PDB/HashStream.rst (added)
+++ llvm/trunk/docs/PDB/HashStream.rst Thu Nov 10 13:24:21 2016
@@ -0,0 +1,3 @@
+=====================================
+The TPI & IPI Hash Streams
+=====================================
Added: llvm/trunk/docs/PDB/ModiStream.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/PDB/ModiStream.rst?rev=286491&view=auto
==============================================================================
--- llvm/trunk/docs/PDB/ModiStream.rst (added)
+++ llvm/trunk/docs/PDB/ModiStream.rst Thu Nov 10 13:24:21 2016
@@ -0,0 +1,3 @@
+=====================================
+The Module Information Stream
+=====================================
Added: llvm/trunk/docs/PDB/MsfFile.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/PDB/MsfFile.rst?rev=286491&view=auto
==============================================================================
--- llvm/trunk/docs/PDB/MsfFile.rst (added)
+++ llvm/trunk/docs/PDB/MsfFile.rst Thu Nov 10 13:24:21 2016
@@ -0,0 +1,121 @@
+=====================================
+The MSF File Format
+=====================================
+
+.. contents::
+ :local:
+
+.. _msf_superblock:
+
+The Superblock
+==============
+At file offset 0 in an MSF file is the MSF *SuperBlock*, which is laid out as
+follows:
+
+.. code-block:: c++
+
+ struct SuperBlock {
+ char FileMagic[sizeof(Magic)];
+ ulittle32_t BlockSize;
+ ulittle32_t FreeBlockMapBlock;
+ ulittle32_t NumBlocks;
+ ulittle32_t NumDirectoryBytes;
+ ulittle32_t Unknown;
+ ulittle32_t BlockMapAddr;
+ };
+
+- **FileMagic** - Must be equal to ``"Microsoft C / C++ MSF 7.00\\r\\n"``
+ followed by the bytes ``1A 44 53 00 00 00``.
+- **BlockSize** - The block size of the internal file system. Valid values are
+ 512, 1024, 2048, and 4096 bytes. Certain aspects of the MSF file layout vary
+ depending on the block sizes. For the purposes of LLVM, we handle only block
+ sizes of 4KiB, and all further discussion assumes a block size of 4KiB.
+- **FreeBlockMapBlock** - The index of a block within the file, at which begins
+ a bitfield representing the set of all blocks within the file which are "free"
+ (i.e. the data within that block is not used). This bitfield is spread across
+ the MSF file at ``BlockSize`` intervals.
+ **Important**: ``FreeBlockMapBlock`` can only be ``1`` or ``2``! This field
+ is designed to support incremental and atomic updates of the underlying MSF
+ file. While writing to an MSF file, if the value of this field is `1`, you
+ can write your new modified bitfield to page 2, and vice versa. Only when
+ you commit the file to disk do you need to swap the value in the SuperBlock
+ to point to the new ``FreeBlockMapBlock``.
+- **NumBlocks** - The total number of blocks in the file. ``NumBlocks * BlockSize``
+ should equal the size of the file on disk.
+- **NumDirectoryBytes** - The size of the stream directory, in bytes. The stream
+ directory contains information about each stream's size and the set of blocks
+ that it occupies. It will be described in more detail later.
+- **BlockMapAddr** - The index of a block within the MSF file. At this block is
+ an array of ``ulittle32_t``'s listing the blocks that the stream directory
+ resides on. For large MSF files, the stream directory (which describes the
+ block layout of each stream) may not fit entirely on a single block. As a
+ result, this extra layer of indirection is introduced, whereby this block
+ contains the list of blocks that the stream directory occupies, and the stream
+ directory itself can be stitched together accordingly. The number of
+ ``ulittle32_t``'s in this array is given by ``ceil(NumDirectoryBytes / BlockSize)``.
+
+The Stream Directory
+====================
+The Stream Directory is the root of all access to the other streams in an MSF
+file. Beginning at byte 0 of the stream directory is the following structure:
+
+.. code-block:: c++
+
+ struct StreamDirectory {
+ ulittle32_t NumStreams;
+ ulittle32_t StreamSizes[NumStreams];
+ ulittle32_t StreamBlocks[NumStreams][];
+ };
+
+And this structure occupies exactly ``SuperBlock->NumDirectoryBytes`` bytes.
+Note that each of the last two arrays is of variable length, and in particular
+that the second array is jagged.
+
+**Example:** Suppose a hypothetical PDB file with a 4KiB block size, and 4
+streams of lengths {1000 bytes, 8000 bytes, 16000 bytes, 9000 bytes}.
+
+Stream 0: ceil(1000 / 4096) = 1 block
+
+Stream 1: ceil(8000 / 4096) = 2 blocks
+
+Stream 2: ceil(16000 / 4096) = 4 blocks
+
+Stream 3: ceil(9000 / 4096) = 3 blocks
+
+In total, 10 blocks are used. Let's see what the stream directory might look
+like:
+
+.. code-block:: c++
+
+ struct StreamDirectory {
+ ulittle32_t NumStreams = 4;
+ ulittle32_t StreamSizes[] = {1000, 8000, 16000, 9000};
+ ulittle32_t StreamBlocks[][] = {
+ {4},
+ {5, 6},
+ {11, 9, 7, 8},
+ {10, 15, 12}
+ };
+ };
+
+In total, this occupies ``15 * 4 = 60`` bytes, so ``SuperBlock->NumDirectoryBytes``
+would equal ``60``, and ``SuperBlock->BlockMapAddr`` would be an array of one
+``ulittle32_t``, since ``60 <= SuperBlock->BlockSize``.
+
+Note also that the streams are discontiguous, and that part of stream 3 is in the
+middle of part of stream 2. You cannot assume anything about the layout of the
+blocks!
+
+Alignment and Block Boundaries
+==============================
+As may be clear by now, it is possible for a single field (whether it be a high
+level record, a long string field, or even a single ``uint16``) to begin and
+end in separate blocks. For example, if the block size is 4096 bytes, and a
+``uint16`` field begins at the last byte of the current block, then it would
+need to end on the first byte of the next block. Since blocks are not
+necessarily contiguously laid out in the file, this means that both the consumer
+and the producer of an MSF file must be prepared to split data apart
+accordingly. In the aforementioned example, the high byte of the ``uint16``
+would be written to the last byte of block N, and the low byte would be written
+to the first byte of block N+1, which could be tens of thousands of bytes later
+(or even earlier!) in the file, depending on what the stream directory says.
Added: llvm/trunk/docs/PDB/PdbStream.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/PDB/PdbStream.rst?rev=286491&view=auto
==============================================================================
--- llvm/trunk/docs/PDB/PdbStream.rst (added)
+++ llvm/trunk/docs/PDB/PdbStream.rst Thu Nov 10 13:24:21 2016
@@ -0,0 +1,3 @@
+========================================
+The PDB Info Stream (aka the PDB Stream)
+========================================
Added: llvm/trunk/docs/PDB/PublicStream.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/PDB/PublicStream.rst?rev=286491&view=auto
==============================================================================
--- llvm/trunk/docs/PDB/PublicStream.rst (added)
+++ llvm/trunk/docs/PDB/PublicStream.rst Thu Nov 10 13:24:21 2016
@@ -0,0 +1,3 @@
+=====================================
+The PDB Public Symbol Stream
+=====================================
Added: llvm/trunk/docs/PDB/TpiStream.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/PDB/TpiStream.rst?rev=286491&view=auto
==============================================================================
--- llvm/trunk/docs/PDB/TpiStream.rst (added)
+++ llvm/trunk/docs/PDB/TpiStream.rst Thu Nov 10 13:24:21 2016
@@ -0,0 +1,3 @@
+=====================================
+The PDB TPI Stream
+=====================================
Added: llvm/trunk/docs/PDB/index.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/PDB/index.rst?rev=286491&view=auto
==============================================================================
--- llvm/trunk/docs/PDB/index.rst (added)
+++ llvm/trunk/docs/PDB/index.rst Thu Nov 10 13:24:21 2016
@@ -0,0 +1,160 @@
+=====================================
+The PDB File Format
+=====================================
+
+.. contents::
+ :local:
+
+.. _pdb_intro:
+
+Introduction
+============
+
+PDB (Program Database) is a file format invented by Microsoft and which contains
+debug information that can be consumed by debuggers and other tools. Since
+officially supported APIs exist on Windows for querying debug information from
+PDBs even without the user understanding the internals of the file format, a
+large ecosystem of tools has been built for Windows to consume this format. In
+order for Clang to be able to generate programs that can interoperate with these
+tools, it is necessary for us to generate PDB files ourselves.
+
+At the same time, LLVM has a long history of being able to cross-compile from
+any platform to any platform, and we wish for the same to be true here. So it
+is necessary for us to understand the PDB file format at the byte-level so that
+we can generate PDB files entirely on our own.
+
+This manual describes what we know about the PDB file format today. The layout
+of the file, the various streams contained within, the format of individual
+records within, and more.
+
+We would like to extend our heartfelt gratitude to Microsoft, without whom we
+would not be where we are today. Much of the knowledge contained within this
+manual was learned through reading code published by Microsoft on their `GitHub
+repo <https://github.com/Microsoft/microsoft-pdb>`__.
+
+.. _pdb_layout:
+
+File Layout
+===========
+
+.. toctree::
+ :hidden:
+
+ MsfFile
+ PdbStream
+ TpiStream
+ DbiStream
+ ModiStream
+ PublicStream
+ GlobalStream
+ HashStream
+
+.. _msf:
+
+The MSF Container
+-----------------
+A PDB file is really just a special case of an MSF (Multi-Stream Format) file.
+An MSF file is actually a miniature "file system within a file". It contains
+multiple streams (aka files) which can represent arbitrary data, and these
+streams are divided into blocks which may not necessarily be contiguously
+laid out within the file (aka fragmented). Additionally, the MSF contains a
+stream directory (aka MFT) which describes how the streams (files) are laid
+out within the MSF.
+
+For more information about the MSF container format, stream directory, and
+block layout, see :doc:`MsfFile`.
+
+.. _streams:
+
+Streams
+-------
+The PDB format contains a number of streams which describe various information
+such as the types, symbols, source files, and compilands (e.g. object files)
+of a program, as well as some additional streams containing hash tables that are
+used by debuggers and other tools to provide fast lookup of records and types
+by name, and various other information about how the program was compiled such
+as the specific toolchain used, and more. A summary of streams contained in a
+PDB file is as follows:
+
++--------------------+------------------------------+-------------------------------------------+
+| Name | Stream Index | Contents |
++====================+==============================+===========================================+
+| Old Directory | - Fixed Stream Index 0 | - Previous MSF Stream Directory |
++--------------------+------------------------------+-------------------------------------------+
+| PDB Stream | - Fixed Stream Index 1 | - Basic File Information |
+| | | - Fields to match EXE to this PDB |
+| | | - Map of named streams to stream indices |
++--------------------+------------------------------+-------------------------------------------+
+| TPI Stream | - Fixed Stream Index 2 | - CodeView Type Records |
+| | | - Index of TPI Hash Stream |
++--------------------+------------------------------+-------------------------------------------+
+| DBI Stream | - Fixed Stream Index 3 | - Module/Compiland Information |
+| | | - Indices of individual module streams |
+| | | - Indices of public / global streams |
+| | | - Section Contribution Information |
+| | | - Source File Information |
+| | | - FPO / PGO Data |
++--------------------+------------------------------+-------------------------------------------+
+| IPI Stream | - Fixed Stream Index 4 | - CodeView Type Records |
+| | | - Index of IPI Hash Stream |
++--------------------+------------------------------+-------------------------------------------+
+| /LinkInfo | - Contained in PDB Stream | - Unknown |
+| | Named Stream map | |
++--------------------+------------------------------+-------------------------------------------+
+| /src/headerblock | - Contained in PDB Stream | - Unknown |
+| | Named Stream map | |
++--------------------+------------------------------+-------------------------------------------+
+| /names | - Contained in PDB Stream | - PDB-wide global string table used for |
+| | Named Stream map | string de-duplication |
++--------------------+------------------------------+-------------------------------------------+
+| Module Info Stream | - Contained in DBI Stream | - CodeView Symbol Records for this module |
+| | - One for each compiland | - Line Number Information |
++--------------------+------------------------------+-------------------------------------------+
+| Public Stream | - Contained in DBI Stream | - Public (Exported) Symbol Records |
+| | | - Index of Public Hash Stream |
++--------------------+------------------------------+-------------------------------------------+
+| Global Stream | - Contained in DBI Stream | - Global Symbol Records |
+| | | - Index of Global Hash Stream |
++--------------------+------------------------------+-------------------------------------------+
+| TPI Hash Stream | - Contained in TPI Stream | - Hash table for looking up TPI records |
+| | | by name |
++--------------------+------------------------------+-------------------------------------------+
+| IPI Hash Stream | - Contained in IPI Stream | - Hash table for looking up IPI records |
+| | | by name |
++--------------------+------------------------------+-------------------------------------------+
+
+More information about the structure of each of these can be found on the
+following pages:
+
+:doc:`PdbStream`
+ Information about the PDB Info Stream and how it is used to match PDBs to EXEs.
+
+:doc:`TpiStream`
+ Information about the TPI stream and the CodeView records contained within.
+
+:doc:`DbiStream`
+ Information about the DBI stream and relevant substreams including the Module Substreams,
+ source file information, and CodeView symbol records contained within.
+
+:doc:`ModiStream`
+ Information about the Module Information Stream, of which there is one for each compilation
+ unit and the format of symbols contained within.
+
+:doc:`PublicStream`
+ Information about the Public Symbol Stream.
+
+:doc:`GlobalStream`
+ Information about the Global Symbol Stream.
+
+:doc:`HashStream`
+ Information about the Hash Table stream, and how it can be used to quickly look up records
+ by name.
+
+CodeView
+========
+CodeView is another format which comes into the picture. While MSF defines
+the structure of the overall file, and PDB defines the set of streams that
+appear within the MSF file and the format of those streams, CodeView defines
+the format of **symbol and type records** that appear within specific streams.
+Refer to the pages on `CodeView Symbol Records` and `CodeView Type Records` for
+more information about the CodeView format.
Modified: llvm/trunk/docs/index.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/index.rst?rev=286491&r1=286490&r2=286491&view=diff
==============================================================================
--- llvm/trunk/docs/index.rst (original)
+++ llvm/trunk/docs/index.rst Thu Nov 10 13:24:21 2016
@@ -274,6 +274,7 @@ For API clients and LLVM developers.
Coroutines
GlobalISel
XRay
+ PDB/index
:doc:`WritingAnLLVMPass`
Information on how to write LLVM transformations and analyses.
@@ -398,6 +399,9 @@ For API clients and LLVM developers.
:doc:`XRay`
High-level documentation of how to use XRay in LLVM.
+:doc:`The Microsoft PDB File Format <PDB/index>`
+ A detailed description of the Microsoft PDB (Program Database) file format.
+
Development Process Documentation
=================================
More information about the llvm-commits
mailing list