[llvm] r359712 - Convert PDB docs to unix line endings. No other changes.
Nico Weber via llvm-commits
llvm-commits at lists.llvm.org
Wed May 1 12:15:05 PDT 2019
Author: nico
Date: Wed May 1 12:15:05 2019
New Revision: 359712
URL: http://llvm.org/viewvc/llvm-project?rev=359712&view=rev
Log:
Convert PDB docs to unix line endings. No other changes.
Modified:
llvm/trunk/docs/PDB/GlobalStream.rst
llvm/trunk/docs/PDB/HashTable.rst
llvm/trunk/docs/PDB/ModiStream.rst
llvm/trunk/docs/PDB/MsfFile.rst
llvm/trunk/docs/PDB/PublicStream.rst
llvm/trunk/docs/PDB/TpiStream.rst
llvm/trunk/docs/PDB/index.rst
Modified: llvm/trunk/docs/PDB/GlobalStream.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/PDB/GlobalStream.rst?rev=359712&r1=359711&r2=359712&view=diff
==============================================================================
--- llvm/trunk/docs/PDB/GlobalStream.rst (original)
+++ llvm/trunk/docs/PDB/GlobalStream.rst Wed May 1 12:15:05 2019
@@ -1,3 +1,3 @@
-=====================================
-The PDB Global Symbol Stream
-=====================================
+=====================================
+The PDB Global Symbol Stream
+=====================================
Modified: llvm/trunk/docs/PDB/HashTable.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/PDB/HashTable.rst?rev=359712&r1=359711&r2=359712&view=diff
==============================================================================
--- llvm/trunk/docs/PDB/HashTable.rst (original)
+++ llvm/trunk/docs/PDB/HashTable.rst Wed May 1 12:15:05 2019
@@ -1,103 +1,103 @@
-The PDB Serialized Hash Table Format
-====================================
-
-.. contents::
- :local:
-
-.. _hash_intro:
-
-Introduction
-============
-
-One of the design goals of the PDB format is to provide accelerated access to
-debug information, and for this reason there are several occasions where hash
-tables are serialized and embedded directly to the file, rather than requiring
-a consumer to read a list of values and reconstruct the hash table on the fly.
-
-The serialization format supports hash tables of arbitrarily large size and
-capacity, as well as value types and hash functions. The only supported key
-value type is a uint32. The only requirement is that the producer and consumer
-agree on the hash function. As such, the hash function can is not discussed
-further in this document, it is assumed that for a particular instance of a PDB
-file hash table, the appropriate hash function is being used.
-
-On-Disk Format
-==============
-
-.. code-block:: none
-
- .--------------------.-- +0
- | Size |
- .--------------------.-- +4
- | Capacity |
- .--------------------.-- +8
- | Present Bit Vector |
- .--------------------.-- +N
- | Deleted Bit Vector |
- .--------------------.-- +M ââ®
- | Key | â
- .--------------------.-- +M+4 â
- | Value | â
- .--------------------.-- +M+4+sizeof(Value) â
- ... ââ |Capacity| Bucket entries
- .--------------------. â
- | Key | â
- .--------------------. â
- | Value | â
- .--------------------. ââ¯
-
-- **Size** - The number of values contained in the hash table.
-
-- **Capacity** - The number of buckets in the hash table. Producers should
- maintain a load factor of no greater than ``2/3*Capacity+1``.
-
-- **Present Bit Vector** - A serialized bit vector which contains information
- about which buckets have valid values. If the bucket has a value, the
- corresponding bit will be set, and if the bucket doesn't have a value (either
- because the bucket is empty or because the value is a tombstone value) the bit
- will be unset.
-
-- **Deleted Bit Vector** - A serialized bit vector which contains information
- about which buckets have tombstone values. If the entry in this bucket is
- deleted, the bit will be set, otherwise it will be unset.
-
-- **Keys and Values** - A list of ``Capacity`` hash buckets, where the first
- entry is the key (always a uint32), and the second entry is the value. The
- state of each bucket (valid, empty, deleted) can be determined by examining
- the present and deleted bit vectors.
-
-
-.. _hash_bit_vectors:
-
-Present and Deleted Bit Vectors
-===============================
-
-The bit vectors indicating the status of each bucket are serialized as follows:
-
-.. code-block:: none
-
- .--------------------.-- +0
- | Word Count |
- .--------------------.-- +4
- | Word_0 | ââ®
- .--------------------.-- +8 â
- | Word_1 | â
- .--------------------.-- +12 ââ |Word Count| values
- ... â
- .--------------------. â
- | Word_N | â
- .--------------------. ââ¯
-
-The words, when viewed as a contiguous block of bytes, represent a bit vector with
-the following layout:
-
-.. code-block:: none
-
- .------------. .------------.------------.
- | Word_N | ... | Word_1 | Word_0 |
- .------------. .------------.------------.
- | | | | |
- +N*32 +(N-1)*32 +64 +32 +0
-
-where the k'th bit of this bit vector represents the status of the k'th bucket
-in the hash table.
+The PDB Serialized Hash Table Format
+====================================
+
+.. contents::
+ :local:
+
+.. _hash_intro:
+
+Introduction
+============
+
+One of the design goals of the PDB format is to provide accelerated access to
+debug information, and for this reason there are several occasions where hash
+tables are serialized and embedded directly to the file, rather than requiring
+a consumer to read a list of values and reconstruct the hash table on the fly.
+
+The serialization format supports hash tables of arbitrarily large size and
+capacity, as well as value types and hash functions. The only supported key
+value type is a uint32. The only requirement is that the producer and consumer
+agree on the hash function. As such, the hash function can is not discussed
+further in this document, it is assumed that for a particular instance of a PDB
+file hash table, the appropriate hash function is being used.
+
+On-Disk Format
+==============
+
+.. code-block:: none
+
+ .--------------------.-- +0
+ | Size |
+ .--------------------.-- +4
+ | Capacity |
+ .--------------------.-- +8
+ | Present Bit Vector |
+ .--------------------.-- +N
+ | Deleted Bit Vector |
+ .--------------------.-- +M ââ®
+ | Key | â
+ .--------------------.-- +M+4 â
+ | Value | â
+ .--------------------.-- +M+4+sizeof(Value) â
+ ... ââ |Capacity| Bucket entries
+ .--------------------. â
+ | Key | â
+ .--------------------. â
+ | Value | â
+ .--------------------. ââ¯
+
+- **Size** - The number of values contained in the hash table.
+
+- **Capacity** - The number of buckets in the hash table. Producers should
+ maintain a load factor of no greater than ``2/3*Capacity+1``.
+
+- **Present Bit Vector** - A serialized bit vector which contains information
+ about which buckets have valid values. If the bucket has a value, the
+ corresponding bit will be set, and if the bucket doesn't have a value (either
+ because the bucket is empty or because the value is a tombstone value) the bit
+ will be unset.
+
+- **Deleted Bit Vector** - A serialized bit vector which contains information
+ about which buckets have tombstone values. If the entry in this bucket is
+ deleted, the bit will be set, otherwise it will be unset.
+
+- **Keys and Values** - A list of ``Capacity`` hash buckets, where the first
+ entry is the key (always a uint32), and the second entry is the value. The
+ state of each bucket (valid, empty, deleted) can be determined by examining
+ the present and deleted bit vectors.
+
+
+.. _hash_bit_vectors:
+
+Present and Deleted Bit Vectors
+===============================
+
+The bit vectors indicating the status of each bucket are serialized as follows:
+
+.. code-block:: none
+
+ .--------------------.-- +0
+ | Word Count |
+ .--------------------.-- +4
+ | Word_0 | ââ®
+ .--------------------.-- +8 â
+ | Word_1 | â
+ .--------------------.-- +12 ââ |Word Count| values
+ ... â
+ .--------------------. â
+ | Word_N | â
+ .--------------------. ââ¯
+
+The words, when viewed as a contiguous block of bytes, represent a bit vector with
+the following layout:
+
+.. code-block:: none
+
+ .------------. .------------.------------.
+ | Word_N | ... | Word_1 | Word_0 |
+ .------------. .------------.------------.
+ | | | | |
+ +N*32 +(N-1)*32 +64 +32 +0
+
+where the k'th bit of this bit vector represents the status of the k'th bucket
+in the hash table.
Modified: llvm/trunk/docs/PDB/ModiStream.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/PDB/ModiStream.rst?rev=359712&r1=359711&r2=359712&view=diff
==============================================================================
--- llvm/trunk/docs/PDB/ModiStream.rst (original)
+++ llvm/trunk/docs/PDB/ModiStream.rst Wed May 1 12:15:05 2019
@@ -1,80 +1,80 @@
-=====================================
-The Module Information Stream
-=====================================
-
-.. contents::
- :local:
-
-.. _modi_stream_intro:
-
-Introduction
-============
-
-The Module Info Stream (henceforth referred to as the Modi stream) contains
-information about a single module (object file, import library, etc that
-contributes to the binary this PDB contains debug information about. There
-is one modi stream for each module, and the mapping between modi stream index
-and module is contained in the :doc:`DBI Stream <DbiStream>`. The modi stream
-for a single module contains line information for the compiland, as well as
-all CodeView information for the symbols defined in the compiland. Finally,
-there is a "global refs" substream which is not well understood.
-
-.. _modi_stream_layout:
-
-Stream Layout
-=============
-
-A modi stream is laid out as follows:
-
-
-.. code-block:: c++
-
- struct ModiStream {
- uint32_t Signature;
- uint8_t Symbols[SymbolSize-4];
- uint8_t C11LineInfo[C11Size];
- uint8_t C13LineInfo[C13Size];
-
- uint32_t GlobalRefsSize;
- uint8_t GlobalRefs[GlobalRefsSize];
- };
-
-- **Signature** - Unknown. In practice only the value of ``4`` has been
- observed. It is hypothesized that this value corresponds to the set of
- ``CV_SIGNATURE_xx`` defines in ``cvinfo.h``, with the value of ``4``
- meaning that this module has C13 line information (as opposed to C11 line
- information). A corollary of this is that we expect to only ever see
- C13 line info, and that we do not understand the format of C11 line info.
-
-- **Symbols** - The :ref:`CodeView Symbol Substream <modi_symbol_substream>`.
- ``SymbolSize`` is equal to the value of ``SymByteSize`` for the
- corresponding module's entry in the :ref:`Module Info Substream <dbi_mod_info_substream>`
- of the :doc:`DBI Stream <DbiStream>`.
-
-- **C11LineInfo** - A block containing CodeView line information in C11
- format. ``C11Size`` is equal to the value of ``C11ByteSize`` from the
- :ref:`Module Info Substream <dbi_mod_info_substream>` of the
- :doc:`DBI Stream <DbiStream>`. If this value is ``0``, then C11 line
- information is not present. As mentioned previously, the format of
- C11 line info is not understood and we assume all line in modern PDBs
- to be in C13 format.
-
-- **C13LineInfo** - A block containing CodeView line information in C13
- format. ``C13Size`` is equal to the value of ``C13ByteSize`` from the
- :ref:`Module Info Substream <dbi_mod_info_substream>` of the
- :doc:`DBI Stream <DbiStream>`. If this value is ``0``, then C13 line
- information is not present.
-
-- **GlobalRefs** - The meaning of this substream is not understood.
-
-.. _modi_symbol_substream:
-
-The CodeView Symbol Substream
-=============================
-
-The CodeView Symbol Substream. This is an array of variable length
-records describing the functions, variables, inlining information,
-and other symbols defined in the compiland. The entire array consumes
-``SymbolSize-4`` bytes. The format of a CodeView Symbol Record (and
-thusly, an array of CodeView Symbol Records) is described in
-:doc:`CodeViewSymbols`.
+=====================================
+The Module Information Stream
+=====================================
+
+.. contents::
+ :local:
+
+.. _modi_stream_intro:
+
+Introduction
+============
+
+The Module Info Stream (henceforth referred to as the Modi stream) contains
+information about a single module (object file, import library, etc that
+contributes to the binary this PDB contains debug information about. There
+is one modi stream for each module, and the mapping between modi stream index
+and module is contained in the :doc:`DBI Stream <DbiStream>`. The modi stream
+for a single module contains line information for the compiland, as well as
+all CodeView information for the symbols defined in the compiland. Finally,
+there is a "global refs" substream which is not well understood.
+
+.. _modi_stream_layout:
+
+Stream Layout
+=============
+
+A modi stream is laid out as follows:
+
+
+.. code-block:: c++
+
+ struct ModiStream {
+ uint32_t Signature;
+ uint8_t Symbols[SymbolSize-4];
+ uint8_t C11LineInfo[C11Size];
+ uint8_t C13LineInfo[C13Size];
+
+ uint32_t GlobalRefsSize;
+ uint8_t GlobalRefs[GlobalRefsSize];
+ };
+
+- **Signature** - Unknown. In practice only the value of ``4`` has been
+ observed. It is hypothesized that this value corresponds to the set of
+ ``CV_SIGNATURE_xx`` defines in ``cvinfo.h``, with the value of ``4``
+ meaning that this module has C13 line information (as opposed to C11 line
+ information). A corollary of this is that we expect to only ever see
+ C13 line info, and that we do not understand the format of C11 line info.
+
+- **Symbols** - The :ref:`CodeView Symbol Substream <modi_symbol_substream>`.
+ ``SymbolSize`` is equal to the value of ``SymByteSize`` for the
+ corresponding module's entry in the :ref:`Module Info Substream <dbi_mod_info_substream>`
+ of the :doc:`DBI Stream <DbiStream>`.
+
+- **C11LineInfo** - A block containing CodeView line information in C11
+ format. ``C11Size`` is equal to the value of ``C11ByteSize`` from the
+ :ref:`Module Info Substream <dbi_mod_info_substream>` of the
+ :doc:`DBI Stream <DbiStream>`. If this value is ``0``, then C11 line
+ information is not present. As mentioned previously, the format of
+ C11 line info is not understood and we assume all line in modern PDBs
+ to be in C13 format.
+
+- **C13LineInfo** - A block containing CodeView line information in C13
+ format. ``C13Size`` is equal to the value of ``C13ByteSize`` from the
+ :ref:`Module Info Substream <dbi_mod_info_substream>` of the
+ :doc:`DBI Stream <DbiStream>`. If this value is ``0``, then C13 line
+ information is not present.
+
+- **GlobalRefs** - The meaning of this substream is not understood.
+
+.. _modi_symbol_substream:
+
+The CodeView Symbol Substream
+=============================
+
+The CodeView Symbol Substream. This is an array of variable length
+records describing the functions, variables, inlining information,
+and other symbols defined in the compiland. The entire array consumes
+``SymbolSize-4`` bytes. The format of a CodeView Symbol Record (and
+thusly, an array of CodeView Symbol Records) is described in
+:doc:`CodeViewSymbols`.
Modified: llvm/trunk/docs/PDB/MsfFile.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/PDB/MsfFile.rst?rev=359712&r1=359711&r2=359712&view=diff
==============================================================================
--- llvm/trunk/docs/PDB/MsfFile.rst (original)
+++ llvm/trunk/docs/PDB/MsfFile.rst Wed May 1 12:15:05 2019
@@ -1,179 +1,179 @@
-=====================================
-The MSF File Format
-=====================================
-
-.. contents::
- :local:
-
-.. _msf_layout:
-
-File Layout
-===========
-
-The MSF file format consists of the following components:
-
-1. :ref:`msf_superblock`
-2. :ref:`msf_freeblockmap` (also know as Free Page Map, or FPM)
-3. Data
-
-Each component is stored as an indexed block, the length of which is specified
-in ``SuperBlock::BlockSize``. The file consists of 1 or more iterations of the
-following pattern (sometimes referred to as an "interval"):
-
-1. 1 block of data
-2. Free Block Map 1 (corresponds to ``SuperBlock::FreeBlockMapBlock`` 1)
-3. Free Block Map 2 (corresponds to ``SuperBlock::FreeBlockMapBlock`` 2)
-4. ``SuperBlock::BlockSize - 3`` blocks of data
-
-In the first interval, the first data block is used to store
-:ref:`msf_superblock`.
-
-The following diagram demonstrates the general layout of the file (\| denotes
-the end of an interval, and is for visualization purposes only):
-
-+-------------+-----------------------+------------------+------------------+----------+----+------+------+------+-------------+----+-----+
-| Block Index | 0 | 1 | 2 | 3 - 4095 | \| | 4096 | 4097 | 4098 | 4099 - 8191 | \| | ... |
-+=============+=======================+==================+==================+==========+====+======+======+======+=============+====+=====+
-| Meaning | :ref:`msf_superblock` | Free Block Map 1 | Free Block Map 2 | Data | \| | Data | FPM1 | FPM2 | Data | \| | ... |
-+-------------+-----------------------+------------------+------------------+----------+----+------+------+------+-------------+----+-----+
-
-The file may end after any block, including immediately after a FPM1.
-
-.. note::
- LLVM only supports 4096 byte blocks (sometimes referred to as the "BigMsf"
- variant), so the rest of this document will assume a block size of 4096.
-
-.. _msf_superblock:
-
-The Superblock
-==============
-At file offset 0 in an MSF file is the MSF *SuperBlock*, which is laid out as
-follows:
-
-.. code-block:: c++
-
- struct SuperBlock {
- char FileMagic[sizeof(Magic)];
- ulittle32_t BlockSize;
- ulittle32_t FreeBlockMapBlock;
- ulittle32_t NumBlocks;
- ulittle32_t NumDirectoryBytes;
- ulittle32_t Unknown;
- ulittle32_t BlockMapAddr;
- };
-
-- **FileMagic** - Must be equal to ``"Microsoft C / C++ MSF 7.00\\r\\n"``
- followed by the bytes ``1A 44 53 00 00 00``.
-- **BlockSize** - The block size of the internal file system. Valid values are
- 512, 1024, 2048, and 4096 bytes. Certain aspects of the MSF file layout vary
- depending on the block sizes. For the purposes of LLVM, we handle only block
- sizes of 4KiB, and all further discussion assumes a block size of 4KiB.
-- **FreeBlockMapBlock** - The index of a block within the file, at which begins
- a bitfield representing the set of all blocks within the file which are "free"
- (i.e. the data within that block is not used). See :ref:`msf_freeblockmap` for
- more information.
- **Important**: ``FreeBlockMapBlock`` can only be ``1`` or ``2``!
-- **NumBlocks** - The total number of blocks in the file. ``NumBlocks * BlockSize``
- should equal the size of the file on disk.
-- **NumDirectoryBytes** - The size of the stream directory, in bytes. The stream
- directory contains information about each stream's size and the set of blocks
- that it occupies. It will be described in more detail later.
-- **BlockMapAddr** - The index of a block within the MSF file. At this block is
- an array of ``ulittle32_t``'s listing the blocks that the stream directory
- resides on. For large MSF files, the stream directory (which describes the
- block layout of each stream) may not fit entirely on a single block. As a
- result, this extra layer of indirection is introduced, whereby this block
- contains the list of blocks that the stream directory occupies, and the stream
- directory itself can be stitched together accordingly. The number of
- ``ulittle32_t``'s in this array is given by ``ceil(NumDirectoryBytes / BlockSize)``.
-
-.. _msf_freeblockmap:
-
-The Free Block Map
-==================
-
-The Free Block Map (sometimes referred to as the Free Page Map, or FPM) is a
-series of blocks which contains a bit flag for every block in the file. The
-flag will be set to 0 if the block is in use, and 1 if the block is unused.
-
-Each file contains two FPMs, one of which is active at any given time. This
-feature is designed to support incremental and atomic updates of the underlying
-MSF file. While writing to an MSF file, if the active FPM is FPM1, you can
-write your new modified bitfield to FPM2, and vice versa. Only when you commit
-the file to disk do you need to swap the value in the SuperBlock to point to
-the new ``FreeBlockMapBlock``.
-
-The Free Block Maps are stored as a series of single blocks thoughout the file
-at intervals of BlockSize. Because each FPM block is of size ``BlockSize``
-bytes, it contains 8 times as many bits as an interval has blocks. This means
-that the first block of each FPM refers to the first 8 intervals of the file
-(the first 32768 blocks), the second block of each FPM refers to the next 8
-blocks, and so on. This results in far more FPM blocks being present than are
-required, but in order to maintain backwards compatibility the format must stay
-this way.
-
-The Stream Directory
-====================
-The Stream Directory is the root of all access to the other streams in an MSF
-file. Beginning at byte 0 of the stream directory is the following structure:
-
-.. code-block:: c++
-
- struct StreamDirectory {
- ulittle32_t NumStreams;
- ulittle32_t StreamSizes[NumStreams];
- ulittle32_t StreamBlocks[NumStreams][];
- };
-
-And this structure occupies exactly ``SuperBlock->NumDirectoryBytes`` bytes.
-Note that each of the last two arrays is of variable length, and in particular
-that the second array is jagged.
-
-**Example:** Suppose a hypothetical PDB file with a 4KiB block size, and 4
-streams of lengths {1000 bytes, 8000 bytes, 16000 bytes, 9000 bytes}.
-
-Stream 0: ceil(1000 / 4096) = 1 block
-
-Stream 1: ceil(8000 / 4096) = 2 blocks
-
-Stream 2: ceil(16000 / 4096) = 4 blocks
-
-Stream 3: ceil(9000 / 4096) = 3 blocks
-
-In total, 10 blocks are used. Let's see what the stream directory might look
-like:
-
-.. code-block:: c++
-
- struct StreamDirectory {
- ulittle32_t NumStreams = 4;
- ulittle32_t StreamSizes[] = {1000, 8000, 16000, 9000};
- ulittle32_t StreamBlocks[][] = {
- {4},
- {5, 6},
- {11, 9, 7, 8},
- {10, 15, 12}
- };
- };
-
-In total, this occupies ``15 * 4 = 60`` bytes, so ``SuperBlock->NumDirectoryBytes``
-would equal ``60``, and ``SuperBlock->BlockMapAddr`` would be an array of one
-``ulittle32_t``, since ``60 <= SuperBlock->BlockSize``.
-
-Note also that the streams are discontiguous, and that part of stream 3 is in the
-middle of part of stream 2. You cannot assume anything about the layout of the
-blocks!
-
-Alignment and Block Boundaries
-==============================
-As may be clear by now, it is possible for a single field (whether it be a high
-level record, a long string field, or even a single ``uint16``) to begin and
-end in separate blocks. For example, if the block size is 4096 bytes, and a
-``uint16`` field begins at the last byte of the current block, then it would
-need to end on the first byte of the next block. Since blocks are not
-necessarily contiguously laid out in the file, this means that both the consumer
-and the producer of an MSF file must be prepared to split data apart
-accordingly. In the aforementioned example, the high byte of the ``uint16``
-would be written to the last byte of block N, and the low byte would be written
-to the first byte of block N+1, which could be tens of thousands of bytes later
-(or even earlier!) in the file, depending on what the stream directory says.
+=====================================
+The MSF File Format
+=====================================
+
+.. contents::
+ :local:
+
+.. _msf_layout:
+
+File Layout
+===========
+
+The MSF file format consists of the following components:
+
+1. :ref:`msf_superblock`
+2. :ref:`msf_freeblockmap` (also know as Free Page Map, or FPM)
+3. Data
+
+Each component is stored as an indexed block, the length of which is specified
+in ``SuperBlock::BlockSize``. The file consists of 1 or more iterations of the
+following pattern (sometimes referred to as an "interval"):
+
+1. 1 block of data
+2. Free Block Map 1 (corresponds to ``SuperBlock::FreeBlockMapBlock`` 1)
+3. Free Block Map 2 (corresponds to ``SuperBlock::FreeBlockMapBlock`` 2)
+4. ``SuperBlock::BlockSize - 3`` blocks of data
+
+In the first interval, the first data block is used to store
+:ref:`msf_superblock`.
+
+The following diagram demonstrates the general layout of the file (\| denotes
+the end of an interval, and is for visualization purposes only):
+
++-------------+-----------------------+------------------+------------------+----------+----+------+------+------+-------------+----+-----+
+| Block Index | 0 | 1 | 2 | 3 - 4095 | \| | 4096 | 4097 | 4098 | 4099 - 8191 | \| | ... |
++=============+=======================+==================+==================+==========+====+======+======+======+=============+====+=====+
+| Meaning | :ref:`msf_superblock` | Free Block Map 1 | Free Block Map 2 | Data | \| | Data | FPM1 | FPM2 | Data | \| | ... |
++-------------+-----------------------+------------------+------------------+----------+----+------+------+------+-------------+----+-----+
+
+The file may end after any block, including immediately after a FPM1.
+
+.. note::
+ LLVM only supports 4096 byte blocks (sometimes referred to as the "BigMsf"
+ variant), so the rest of this document will assume a block size of 4096.
+
+.. _msf_superblock:
+
+The Superblock
+==============
+At file offset 0 in an MSF file is the MSF *SuperBlock*, which is laid out as
+follows:
+
+.. code-block:: c++
+
+ struct SuperBlock {
+ char FileMagic[sizeof(Magic)];
+ ulittle32_t BlockSize;
+ ulittle32_t FreeBlockMapBlock;
+ ulittle32_t NumBlocks;
+ ulittle32_t NumDirectoryBytes;
+ ulittle32_t Unknown;
+ ulittle32_t BlockMapAddr;
+ };
+
+- **FileMagic** - Must be equal to ``"Microsoft C / C++ MSF 7.00\\r\\n"``
+ followed by the bytes ``1A 44 53 00 00 00``.
+- **BlockSize** - The block size of the internal file system. Valid values are
+ 512, 1024, 2048, and 4096 bytes. Certain aspects of the MSF file layout vary
+ depending on the block sizes. For the purposes of LLVM, we handle only block
+ sizes of 4KiB, and all further discussion assumes a block size of 4KiB.
+- **FreeBlockMapBlock** - The index of a block within the file, at which begins
+ a bitfield representing the set of all blocks within the file which are "free"
+ (i.e. the data within that block is not used). See :ref:`msf_freeblockmap` for
+ more information.
+ **Important**: ``FreeBlockMapBlock`` can only be ``1`` or ``2``!
+- **NumBlocks** - The total number of blocks in the file. ``NumBlocks * BlockSize``
+ should equal the size of the file on disk.
+- **NumDirectoryBytes** - The size of the stream directory, in bytes. The stream
+ directory contains information about each stream's size and the set of blocks
+ that it occupies. It will be described in more detail later.
+- **BlockMapAddr** - The index of a block within the MSF file. At this block is
+ an array of ``ulittle32_t``'s listing the blocks that the stream directory
+ resides on. For large MSF files, the stream directory (which describes the
+ block layout of each stream) may not fit entirely on a single block. As a
+ result, this extra layer of indirection is introduced, whereby this block
+ contains the list of blocks that the stream directory occupies, and the stream
+ directory itself can be stitched together accordingly. The number of
+ ``ulittle32_t``'s in this array is given by ``ceil(NumDirectoryBytes / BlockSize)``.
+
+.. _msf_freeblockmap:
+
+The Free Block Map
+==================
+
+The Free Block Map (sometimes referred to as the Free Page Map, or FPM) is a
+series of blocks which contains a bit flag for every block in the file. The
+flag will be set to 0 if the block is in use, and 1 if the block is unused.
+
+Each file contains two FPMs, one of which is active at any given time. This
+feature is designed to support incremental and atomic updates of the underlying
+MSF file. While writing to an MSF file, if the active FPM is FPM1, you can
+write your new modified bitfield to FPM2, and vice versa. Only when you commit
+the file to disk do you need to swap the value in the SuperBlock to point to
+the new ``FreeBlockMapBlock``.
+
+The Free Block Maps are stored as a series of single blocks thoughout the file
+at intervals of BlockSize. Because each FPM block is of size ``BlockSize``
+bytes, it contains 8 times as many bits as an interval has blocks. This means
+that the first block of each FPM refers to the first 8 intervals of the file
+(the first 32768 blocks), the second block of each FPM refers to the next 8
+blocks, and so on. This results in far more FPM blocks being present than are
+required, but in order to maintain backwards compatibility the format must stay
+this way.
+
+The Stream Directory
+====================
+The Stream Directory is the root of all access to the other streams in an MSF
+file. Beginning at byte 0 of the stream directory is the following structure:
+
+.. code-block:: c++
+
+ struct StreamDirectory {
+ ulittle32_t NumStreams;
+ ulittle32_t StreamSizes[NumStreams];
+ ulittle32_t StreamBlocks[NumStreams][];
+ };
+
+And this structure occupies exactly ``SuperBlock->NumDirectoryBytes`` bytes.
+Note that each of the last two arrays is of variable length, and in particular
+that the second array is jagged.
+
+**Example:** Suppose a hypothetical PDB file with a 4KiB block size, and 4
+streams of lengths {1000 bytes, 8000 bytes, 16000 bytes, 9000 bytes}.
+
+Stream 0: ceil(1000 / 4096) = 1 block
+
+Stream 1: ceil(8000 / 4096) = 2 blocks
+
+Stream 2: ceil(16000 / 4096) = 4 blocks
+
+Stream 3: ceil(9000 / 4096) = 3 blocks
+
+In total, 10 blocks are used. Let's see what the stream directory might look
+like:
+
+.. code-block:: c++
+
+ struct StreamDirectory {
+ ulittle32_t NumStreams = 4;
+ ulittle32_t StreamSizes[] = {1000, 8000, 16000, 9000};
+ ulittle32_t StreamBlocks[][] = {
+ {4},
+ {5, 6},
+ {11, 9, 7, 8},
+ {10, 15, 12}
+ };
+ };
+
+In total, this occupies ``15 * 4 = 60`` bytes, so ``SuperBlock->NumDirectoryBytes``
+would equal ``60``, and ``SuperBlock->BlockMapAddr`` would be an array of one
+``ulittle32_t``, since ``60 <= SuperBlock->BlockSize``.
+
+Note also that the streams are discontiguous, and that part of stream 3 is in the
+middle of part of stream 2. You cannot assume anything about the layout of the
+blocks!
+
+Alignment and Block Boundaries
+==============================
+As may be clear by now, it is possible for a single field (whether it be a high
+level record, a long string field, or even a single ``uint16``) to begin and
+end in separate blocks. For example, if the block size is 4096 bytes, and a
+``uint16`` field begins at the last byte of the current block, then it would
+need to end on the first byte of the next block. Since blocks are not
+necessarily contiguously laid out in the file, this means that both the consumer
+and the producer of an MSF file must be prepared to split data apart
+accordingly. In the aforementioned example, the high byte of the ``uint16``
+would be written to the last byte of block N, and the low byte would be written
+to the first byte of block N+1, which could be tens of thousands of bytes later
+(or even earlier!) in the file, depending on what the stream directory says.
Modified: llvm/trunk/docs/PDB/PublicStream.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/PDB/PublicStream.rst?rev=359712&r1=359711&r2=359712&view=diff
==============================================================================
--- llvm/trunk/docs/PDB/PublicStream.rst (original)
+++ llvm/trunk/docs/PDB/PublicStream.rst Wed May 1 12:15:05 2019
@@ -1,3 +1,3 @@
-=====================================
-The PDB Public Symbol Stream
-=====================================
+=====================================
+The PDB Public Symbol Stream
+=====================================
Modified: llvm/trunk/docs/PDB/TpiStream.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/PDB/TpiStream.rst?rev=359712&r1=359711&r2=359712&view=diff
==============================================================================
--- llvm/trunk/docs/PDB/TpiStream.rst (original)
+++ llvm/trunk/docs/PDB/TpiStream.rst Wed May 1 12:15:05 2019
@@ -1,312 +1,312 @@
-=====================================
-The PDB TPI and IPI Streams
-=====================================
-
-.. contents::
- :local:
-
-.. _tpi_intro:
-
-Introduction
-============
-
-The PDB TPI Stream (Index 2) and IPI Stream (Index 4) contain information about
-all types used in the program. It is organized as a :ref:`header <tpi_header>`
-followed by a list of :doc:`CodeView Type Records <CodeViewTypes>`. Types are
-referenced from various streams and records throughout the PDB by their
-:ref:`type index <type_indices>`. In general, the sequence of type records
-following the :ref:`header <tpi_header>` forms a topologically sorted DAG
-(directed acyclic graph), which means that a type record B can only refer to
-the type A if ``A.TypeIndex < B.TypeIndex``. While there are rare cases where
-this property will not hold (particularly when dealing with object files
-compiled with MASM), an implementation should try very hard to make this
-property hold, as it means the entire type graph can be constructed in a single
-pass.
-
-.. important::
- Type records form a topologically sorted DAG (directed acyclic graph).
-
-.. _tpi_ipi:
-
-TPI vs IPI Stream
-=================
-
-Recent versions of the PDB format (aka all versions covered by this document)
-have 2 streams with identical layout, henceforth referred to as the TPI stream
-and IPI stream. Subsequent contents of this document describing the on-disk
-format apply equally whether it is for the TPI Stream or the IPI Stream. The
-only difference between the two is in *which* CodeView records are allowed to
-appear in each one, summarized by the following table:
-
-+----------------------+---------------------+
-| TPI Stream | IPI Stream |
-+======================+=====================+
-| LF_POINTER | LF_FUNC_ID |
-+----------------------+---------------------+
-| LF_MODIFIER | LF_MFUNC_ID |
-+----------------------+---------------------+
-| LF_PROCEDURE | LF_BUILDINFO |
-+----------------------+---------------------+
-| LF_MFUNCTION | LF_SUBSTR_LIST |
-+----------------------+---------------------+
-| LF_LABEL | LF_STRING_ID |
-+----------------------+---------------------+
-| LF_ARGLIST | LF_UDT_SRC_LINE |
-+----------------------+---------------------+
-| LF_FIELDLIST | LF_UDT_MOD_SRC_LINE |
-+----------------------+---------------------+
-| LF_ARRAY | |
-+----------------------+---------------------+
-| LF_CLASS | |
-+----------------------+---------------------+
-| LF_STRUCTURE | |
-+----------------------+---------------------+
-| LF_INTERFACE | |
-+----------------------+---------------------+
-| LF_UNION | |
-+----------------------+---------------------+
-| LF_ENUM | |
-+----------------------+---------------------+
-| LF_TYPESERVER2 | |
-+----------------------+---------------------+
-| LF_VFTABLE | |
-+----------------------+---------------------+
-| LF_VTSHAPE | |
-+----------------------+---------------------+
-| LF_BITFIELD | |
-+----------------------+---------------------+
-| LF_METHODLIST | |
-+----------------------+---------------------+
-| LF_PRECOMP | |
-+----------------------+---------------------+
-| LF_ENDPRECOMP | |
-+----------------------+---------------------+
-
-The usage of these records is described in more detail in
-:doc:`CodeView Type Records <CodeViewTypes>`.
-
-.. _type_indices:
-
-Type Indices
-============
-
-A type index is a 32-bit integer that uniquely identifies a type inside of an
-object file's ``.debug$T`` section or a PDB file's TPI or IPI stream. The
-value of the type index for the first type record from the TPI stream is given
-by the ``TypeIndexBegin`` member of the :ref:`TPI Stream Header <tpi_header>`
-although in practice this value is always equal to 0x1000 (4096).
-
-Any type index with a high bit set is considered to come from the IPI stream,
-although this appears to be more of a hack, and LLVM does not generate type
-indices of this nature. They can, however, be observed in Microsoft PDBs
-occasionally, so one should be prepared to handle them. Note that having the
-high bit set is not a necessary condition to determine whether a type index
-comes from the IPI stream, it is only sufficient.
-
-Once the high bit is cleared, any type index >= ``TypeIndexBegin`` is presumed
-to come from the appropriate stream, and any type index less than this is a
-bitmask which can be decomposed as follows:
-
-.. code-block:: none
-
- .---------------------------.------.----------.
- | Unused | Mode | Kind |
- '---------------------------'------'----------'
- |+32 |+12 |+8 |+0
-
-
-- **Kind** - A value from the following enum:
-
-.. code-block:: c++
-
- enum class SimpleTypeKind : uint32_t {
- None = 0x0000, // uncharacterized type (no type)
- Void = 0x0003, // void
- NotTranslated = 0x0007, // type not translated by cvpack
- HResult = 0x0008, // OLE/COM HRESULT
-
- SignedCharacter = 0x0010, // 8 bit signed
- UnsignedCharacter = 0x0020, // 8 bit unsigned
- NarrowCharacter = 0x0070, // really a char
- WideCharacter = 0x0071, // wide char
- Character16 = 0x007a, // char16_t
- Character32 = 0x007b, // char32_t
-
- SByte = 0x0068, // 8 bit signed int
- Byte = 0x0069, // 8 bit unsigned int
- Int16Short = 0x0011, // 16 bit signed
- UInt16Short = 0x0021, // 16 bit unsigned
- Int16 = 0x0072, // 16 bit signed int
- UInt16 = 0x0073, // 16 bit unsigned int
- Int32Long = 0x0012, // 32 bit signed
- UInt32Long = 0x0022, // 32 bit unsigned
- Int32 = 0x0074, // 32 bit signed int
- UInt32 = 0x0075, // 32 bit unsigned int
- Int64Quad = 0x0013, // 64 bit signed
- UInt64Quad = 0x0023, // 64 bit unsigned
- Int64 = 0x0076, // 64 bit signed int
- UInt64 = 0x0077, // 64 bit unsigned int
- Int128Oct = 0x0014, // 128 bit signed int
- UInt128Oct = 0x0024, // 128 bit unsigned int
- Int128 = 0x0078, // 128 bit signed int
- UInt128 = 0x0079, // 128 bit unsigned int
-
- Float16 = 0x0046, // 16 bit real
- Float32 = 0x0040, // 32 bit real
- Float32PartialPrecision = 0x0045, // 32 bit PP real
- Float48 = 0x0044, // 48 bit real
- Float64 = 0x0041, // 64 bit real
- Float80 = 0x0042, // 80 bit real
- Float128 = 0x0043, // 128 bit real
-
- Complex16 = 0x0056, // 16 bit complex
- Complex32 = 0x0050, // 32 bit complex
- Complex32PartialPrecision = 0x0055, // 32 bit PP complex
- Complex48 = 0x0054, // 48 bit complex
- Complex64 = 0x0051, // 64 bit complex
- Complex80 = 0x0052, // 80 bit complex
- Complex128 = 0x0053, // 128 bit complex
-
- Boolean8 = 0x0030, // 8 bit boolean
- Boolean16 = 0x0031, // 16 bit boolean
- Boolean32 = 0x0032, // 32 bit boolean
- Boolean64 = 0x0033, // 64 bit boolean
- Boolean128 = 0x0034, // 128 bit boolean
- };
-
-- **Mode** - A value from the following enum:
-
-.. code-block:: c++
-
- enum class SimpleTypeMode : uint32_t {
- Direct = 0, // Not a pointer
- NearPointer = 1, // Near pointer
- FarPointer = 2, // Far pointer
- HugePointer = 3, // Huge pointer
- NearPointer32 = 4, // 32 bit near pointer
- FarPointer32 = 5, // 32 bit far pointer
- NearPointer64 = 6, // 64 bit near pointer
- NearPointer128 = 7 // 128 bit near pointer
- };
-
-Note that for pointers, the bitness is represented in the mode. So a ``void*``
-would have a type index with ``Mode=NearPointer32, Kind=Void`` if built for 32-bits
-but a type index with ``Mode=NearPointer64, Kind=Void`` if built for 64-bits.
-
-By convention, the type index for ``std::nullptr_t`` is constructed the same way
-as the type index for ``void*``, but using the bitless enumeration value
-``NearPointer``.
-
-
-
-.. _tpi_header:
-
-Stream Header
-=============
-At offset 0 of the TPI Stream is a header with the following layout:
-
-
-.. code-block:: c++
-
- struct TpiStreamHeader {
- uint32_t Version;
- uint32_t HeaderSize;
- uint32_t TypeIndexBegin;
- uint32_t TypeIndexEnd;
- uint32_t TypeRecordBytes;
-
- uint16_t HashStreamIndex;
- uint16_t HashAuxStreamIndex;
- uint32_t HashKeySize;
- uint32_t NumHashBuckets;
-
- int32_t HashValueBufferOffset;
- uint32_t HashValueBufferLength;
-
- int32_t IndexOffsetBufferOffset;
- uint32_t IndexOffsetBufferLength;
-
- int32_t HashAdjBufferOffset;
- uint32_t HashAdjBufferLength;
- };
-
-- **Version** - A value from the following enum.
-
-.. code-block:: c++
-
- enum class TpiStreamVersion : uint32_t {
- V40 = 19950410,
- V41 = 19951122,
- V50 = 19961031,
- V70 = 19990903,
- V80 = 20040203,
- };
-
-Similar to the :doc:`PDB Stream <PdbStream>`, this value always appears to be
-``V80``, and no other values have been observed. It is assumed that should
-another value be observed, the layout described by this document may not be
-accurate.
-
-- **HeaderSize** - ``sizeof(TpiStreamHeader)``
-
-- **TypeIndexBegin** - The numeric value of the type index representing the
- first type record in the TPI stream. This is usually the value 0x1000 as type
- indices lower than this are reserved (see :ref:`Type Indices <type_indices>` for
- a discussion of reserved type indices).
-
-- **TypeIndexEnd** - One greater than the numeric value of the type index
- representing the last type record in the TPI stream. The total number of type
- records in the TPI stream can be computed as ``TypeIndexEnd - TypeIndexBegin``.
-
-- **TypeRecordBytes** - The number of bytes of type record data following the header.
-
-- **HashStreamIndex** - The index of a stream which contains a list of hashes for
- every type record. This value may be -1, indicating that hash information is not
- present. In practice a valid stream index is always observed, so any producer
- implementation should be prepared to emit this stream to ensure compatibility with
- tools which may expect it to be present.
-
-- **HashAuxStreamIndex** - Presumably the index of a stream which contains a separate
- hash table, although this has not been observed in practice and it's unclear what it
- might be used for.
-
-- **HashKeySize** - The size of a hash value (usually 4 bytes).
-
-- **NumHashBuckets** - The number of buckets used to generate the hash values in the
- aforementioned hash streams.
-
-- **HashValueBufferOffset / HashValueBufferLength** - The offset and size within
- the TPI Hash Stream of the list of hash values. It should be assumed that there
- are either 0 hash values, or a number equal to the number of type records in the
- TPI stream (``TypeIndexEnd - TypeEndBegin``). Thus, if ``HashBufferLength`` is
- not equal to ``(TypeIndexEnd - TypeEndBegin) * HashKeySize`` we can consider the
- PDB malformed.
-
-- **IndexOffsetBufferOffset / IndexOffsetBufferLength** - The offset and size
- within the TPI Hash Stream of the Type Index Offsets Buffer. This is a list of
- pairs of uint32_t's where the first value is a :ref:`Type Index <type_indices>`
- and the second value is the offset in the type record data of the type with this
- index. This can be used to do a binary search followed bin a linear search to
- get amortized O(log n) lookup by type index.
-
-- **HashAdjBufferOffset / HashAdjBufferLength** - The offset and size within
- the TPI hash stream of a serialized hash table whose keys are the hash values
- in the hash value buffer and whose values are type indices. This appears to
- be useful in incremental linking scenarios, so that if a type is modified an
- entry can be created mapping the old hash value to the new type index so that
- a PDB file consumer can always have the most up to date version of the type
- without forcing the incremental linker to garbage collect and update
- references that point to the old version to now point to the new version.
- The layout of this hash table is described in :doc:`HashTable`.
-
-.. _tpi_records:
-
-CodeView Type Record List
-=========================
-Following the header, there are ``TypeRecordBytes`` bytes of data that represent a
-variable length array of :doc:`CodeView type records <CodeViewTypes>`. The number
-of such records (e.g. the length of the array) can be determined by computing the
-value ``Header.TypeIndexEnd - Header.TypeIndexBegin``.
-
-log(n) random access is provided by way of the Type Index Offsets array (if present)
-described previously.
\ No newline at end of file
+=====================================
+The PDB TPI and IPI Streams
+=====================================
+
+.. contents::
+ :local:
+
+.. _tpi_intro:
+
+Introduction
+============
+
+The PDB TPI Stream (Index 2) and IPI Stream (Index 4) contain information about
+all types used in the program. It is organized as a :ref:`header <tpi_header>`
+followed by a list of :doc:`CodeView Type Records <CodeViewTypes>`. Types are
+referenced from various streams and records throughout the PDB by their
+:ref:`type index <type_indices>`. In general, the sequence of type records
+following the :ref:`header <tpi_header>` forms a topologically sorted DAG
+(directed acyclic graph), which means that a type record B can only refer to
+the type A if ``A.TypeIndex < B.TypeIndex``. While there are rare cases where
+this property will not hold (particularly when dealing with object files
+compiled with MASM), an implementation should try very hard to make this
+property hold, as it means the entire type graph can be constructed in a single
+pass.
+
+.. important::
+ Type records form a topologically sorted DAG (directed acyclic graph).
+
+.. _tpi_ipi:
+
+TPI vs IPI Stream
+=================
+
+Recent versions of the PDB format (aka all versions covered by this document)
+have 2 streams with identical layout, henceforth referred to as the TPI stream
+and IPI stream. Subsequent contents of this document describing the on-disk
+format apply equally whether it is for the TPI Stream or the IPI Stream. The
+only difference between the two is in *which* CodeView records are allowed to
+appear in each one, summarized by the following table:
+
++----------------------+---------------------+
+| TPI Stream | IPI Stream |
++======================+=====================+
+| LF_POINTER | LF_FUNC_ID |
++----------------------+---------------------+
+| LF_MODIFIER | LF_MFUNC_ID |
++----------------------+---------------------+
+| LF_PROCEDURE | LF_BUILDINFO |
++----------------------+---------------------+
+| LF_MFUNCTION | LF_SUBSTR_LIST |
++----------------------+---------------------+
+| LF_LABEL | LF_STRING_ID |
++----------------------+---------------------+
+| LF_ARGLIST | LF_UDT_SRC_LINE |
++----------------------+---------------------+
+| LF_FIELDLIST | LF_UDT_MOD_SRC_LINE |
++----------------------+---------------------+
+| LF_ARRAY | |
++----------------------+---------------------+
+| LF_CLASS | |
++----------------------+---------------------+
+| LF_STRUCTURE | |
++----------------------+---------------------+
+| LF_INTERFACE | |
++----------------------+---------------------+
+| LF_UNION | |
++----------------------+---------------------+
+| LF_ENUM | |
++----------------------+---------------------+
+| LF_TYPESERVER2 | |
++----------------------+---------------------+
+| LF_VFTABLE | |
++----------------------+---------------------+
+| LF_VTSHAPE | |
++----------------------+---------------------+
+| LF_BITFIELD | |
++----------------------+---------------------+
+| LF_METHODLIST | |
++----------------------+---------------------+
+| LF_PRECOMP | |
++----------------------+---------------------+
+| LF_ENDPRECOMP | |
++----------------------+---------------------+
+
+The usage of these records is described in more detail in
+:doc:`CodeView Type Records <CodeViewTypes>`.
+
+.. _type_indices:
+
+Type Indices
+============
+
+A type index is a 32-bit integer that uniquely identifies a type inside of an
+object file's ``.debug$T`` section or a PDB file's TPI or IPI stream. The
+value of the type index for the first type record from the TPI stream is given
+by the ``TypeIndexBegin`` member of the :ref:`TPI Stream Header <tpi_header>`
+although in practice this value is always equal to 0x1000 (4096).
+
+Any type index with a high bit set is considered to come from the IPI stream,
+although this appears to be more of a hack, and LLVM does not generate type
+indices of this nature. They can, however, be observed in Microsoft PDBs
+occasionally, so one should be prepared to handle them. Note that having the
+high bit set is not a necessary condition to determine whether a type index
+comes from the IPI stream, it is only sufficient.
+
+Once the high bit is cleared, any type index >= ``TypeIndexBegin`` is presumed
+to come from the appropriate stream, and any type index less than this is a
+bitmask which can be decomposed as follows:
+
+.. code-block:: none
+
+ .---------------------------.------.----------.
+ | Unused | Mode | Kind |
+ '---------------------------'------'----------'
+ |+32 |+12 |+8 |+0
+
+
+- **Kind** - A value from the following enum:
+
+.. code-block:: c++
+
+ enum class SimpleTypeKind : uint32_t {
+ None = 0x0000, // uncharacterized type (no type)
+ Void = 0x0003, // void
+ NotTranslated = 0x0007, // type not translated by cvpack
+ HResult = 0x0008, // OLE/COM HRESULT
+
+ SignedCharacter = 0x0010, // 8 bit signed
+ UnsignedCharacter = 0x0020, // 8 bit unsigned
+ NarrowCharacter = 0x0070, // really a char
+ WideCharacter = 0x0071, // wide char
+ Character16 = 0x007a, // char16_t
+ Character32 = 0x007b, // char32_t
+
+ SByte = 0x0068, // 8 bit signed int
+ Byte = 0x0069, // 8 bit unsigned int
+ Int16Short = 0x0011, // 16 bit signed
+ UInt16Short = 0x0021, // 16 bit unsigned
+ Int16 = 0x0072, // 16 bit signed int
+ UInt16 = 0x0073, // 16 bit unsigned int
+ Int32Long = 0x0012, // 32 bit signed
+ UInt32Long = 0x0022, // 32 bit unsigned
+ Int32 = 0x0074, // 32 bit signed int
+ UInt32 = 0x0075, // 32 bit unsigned int
+ Int64Quad = 0x0013, // 64 bit signed
+ UInt64Quad = 0x0023, // 64 bit unsigned
+ Int64 = 0x0076, // 64 bit signed int
+ UInt64 = 0x0077, // 64 bit unsigned int
+ Int128Oct = 0x0014, // 128 bit signed int
+ UInt128Oct = 0x0024, // 128 bit unsigned int
+ Int128 = 0x0078, // 128 bit signed int
+ UInt128 = 0x0079, // 128 bit unsigned int
+
+ Float16 = 0x0046, // 16 bit real
+ Float32 = 0x0040, // 32 bit real
+ Float32PartialPrecision = 0x0045, // 32 bit PP real
+ Float48 = 0x0044, // 48 bit real
+ Float64 = 0x0041, // 64 bit real
+ Float80 = 0x0042, // 80 bit real
+ Float128 = 0x0043, // 128 bit real
+
+ Complex16 = 0x0056, // 16 bit complex
+ Complex32 = 0x0050, // 32 bit complex
+ Complex32PartialPrecision = 0x0055, // 32 bit PP complex
+ Complex48 = 0x0054, // 48 bit complex
+ Complex64 = 0x0051, // 64 bit complex
+ Complex80 = 0x0052, // 80 bit complex
+ Complex128 = 0x0053, // 128 bit complex
+
+ Boolean8 = 0x0030, // 8 bit boolean
+ Boolean16 = 0x0031, // 16 bit boolean
+ Boolean32 = 0x0032, // 32 bit boolean
+ Boolean64 = 0x0033, // 64 bit boolean
+ Boolean128 = 0x0034, // 128 bit boolean
+ };
+
+- **Mode** - A value from the following enum:
+
+.. code-block:: c++
+
+ enum class SimpleTypeMode : uint32_t {
+ Direct = 0, // Not a pointer
+ NearPointer = 1, // Near pointer
+ FarPointer = 2, // Far pointer
+ HugePointer = 3, // Huge pointer
+ NearPointer32 = 4, // 32 bit near pointer
+ FarPointer32 = 5, // 32 bit far pointer
+ NearPointer64 = 6, // 64 bit near pointer
+ NearPointer128 = 7 // 128 bit near pointer
+ };
+
+Note that for pointers, the bitness is represented in the mode. So a ``void*``
+would have a type index with ``Mode=NearPointer32, Kind=Void`` if built for 32-bits
+but a type index with ``Mode=NearPointer64, Kind=Void`` if built for 64-bits.
+
+By convention, the type index for ``std::nullptr_t`` is constructed the same way
+as the type index for ``void*``, but using the bitless enumeration value
+``NearPointer``.
+
+
+
+.. _tpi_header:
+
+Stream Header
+=============
+At offset 0 of the TPI Stream is a header with the following layout:
+
+
+.. code-block:: c++
+
+ struct TpiStreamHeader {
+ uint32_t Version;
+ uint32_t HeaderSize;
+ uint32_t TypeIndexBegin;
+ uint32_t TypeIndexEnd;
+ uint32_t TypeRecordBytes;
+
+ uint16_t HashStreamIndex;
+ uint16_t HashAuxStreamIndex;
+ uint32_t HashKeySize;
+ uint32_t NumHashBuckets;
+
+ int32_t HashValueBufferOffset;
+ uint32_t HashValueBufferLength;
+
+ int32_t IndexOffsetBufferOffset;
+ uint32_t IndexOffsetBufferLength;
+
+ int32_t HashAdjBufferOffset;
+ uint32_t HashAdjBufferLength;
+ };
+
+- **Version** - A value from the following enum.
+
+.. code-block:: c++
+
+ enum class TpiStreamVersion : uint32_t {
+ V40 = 19950410,
+ V41 = 19951122,
+ V50 = 19961031,
+ V70 = 19990903,
+ V80 = 20040203,
+ };
+
+Similar to the :doc:`PDB Stream <PdbStream>`, this value always appears to be
+``V80``, and no other values have been observed. It is assumed that should
+another value be observed, the layout described by this document may not be
+accurate.
+
+- **HeaderSize** - ``sizeof(TpiStreamHeader)``
+
+- **TypeIndexBegin** - The numeric value of the type index representing the
+ first type record in the TPI stream. This is usually the value 0x1000 as type
+ indices lower than this are reserved (see :ref:`Type Indices <type_indices>` for
+ a discussion of reserved type indices).
+
+- **TypeIndexEnd** - One greater than the numeric value of the type index
+ representing the last type record in the TPI stream. The total number of type
+ records in the TPI stream can be computed as ``TypeIndexEnd - TypeIndexBegin``.
+
+- **TypeRecordBytes** - The number of bytes of type record data following the header.
+
+- **HashStreamIndex** - The index of a stream which contains a list of hashes for
+ every type record. This value may be -1, indicating that hash information is not
+ present. In practice a valid stream index is always observed, so any producer
+ implementation should be prepared to emit this stream to ensure compatibility with
+ tools which may expect it to be present.
+
+- **HashAuxStreamIndex** - Presumably the index of a stream which contains a separate
+ hash table, although this has not been observed in practice and it's unclear what it
+ might be used for.
+
+- **HashKeySize** - The size of a hash value (usually 4 bytes).
+
+- **NumHashBuckets** - The number of buckets used to generate the hash values in the
+ aforementioned hash streams.
+
+- **HashValueBufferOffset / HashValueBufferLength** - The offset and size within
+ the TPI Hash Stream of the list of hash values. It should be assumed that there
+ are either 0 hash values, or a number equal to the number of type records in the
+ TPI stream (``TypeIndexEnd - TypeEndBegin``). Thus, if ``HashBufferLength`` is
+ not equal to ``(TypeIndexEnd - TypeEndBegin) * HashKeySize`` we can consider the
+ PDB malformed.
+
+- **IndexOffsetBufferOffset / IndexOffsetBufferLength** - The offset and size
+ within the TPI Hash Stream of the Type Index Offsets Buffer. This is a list of
+ pairs of uint32_t's where the first value is a :ref:`Type Index <type_indices>`
+ and the second value is the offset in the type record data of the type with this
+ index. This can be used to do a binary search followed bin a linear search to
+ get amortized O(log n) lookup by type index.
+
+- **HashAdjBufferOffset / HashAdjBufferLength** - The offset and size within
+ the TPI hash stream of a serialized hash table whose keys are the hash values
+ in the hash value buffer and whose values are type indices. This appears to
+ be useful in incremental linking scenarios, so that if a type is modified an
+ entry can be created mapping the old hash value to the new type index so that
+ a PDB file consumer can always have the most up to date version of the type
+ without forcing the incremental linker to garbage collect and update
+ references that point to the old version to now point to the new version.
+ The layout of this hash table is described in :doc:`HashTable`.
+
+.. _tpi_records:
+
+CodeView Type Record List
+=========================
+Following the header, there are ``TypeRecordBytes`` bytes of data that represent a
+variable length array of :doc:`CodeView type records <CodeViewTypes>`. The number
+of such records (e.g. the length of the array) can be determined by computing the
+value ``Header.TypeIndexEnd - Header.TypeIndexBegin``.
+
+log(n) random access is provided by way of the Type Index Offsets array (if present)
+described previously.
Modified: llvm/trunk/docs/PDB/index.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/PDB/index.rst?rev=359712&r1=359711&r2=359712&view=diff
==============================================================================
--- llvm/trunk/docs/PDB/index.rst (original)
+++ llvm/trunk/docs/PDB/index.rst Wed May 1 12:15:05 2019
@@ -1,168 +1,168 @@
-=====================================
-The PDB File Format
-=====================================
-
-.. contents::
- :local:
-
-.. _pdb_intro:
-
-Introduction
-============
-
-PDB (Program Database) is a file format invented by Microsoft and which contains
-debug information that can be consumed by debuggers and other tools. Since
-officially supported APIs exist on Windows for querying debug information from
-PDBs even without the user understanding the internals of the file format, a
-large ecosystem of tools has been built for Windows to consume this format. In
-order for Clang to be able to generate programs that can interoperate with these
-tools, it is necessary for us to generate PDB files ourselves.
-
-At the same time, LLVM has a long history of being able to cross-compile from
-any platform to any platform, and we wish for the same to be true here. So it
-is necessary for us to understand the PDB file format at the byte-level so that
-we can generate PDB files entirely on our own.
-
-This manual describes what we know about the PDB file format today. The layout
-of the file, the various streams contained within, the format of individual
-records within, and more.
-
-We would like to extend our heartfelt gratitude to Microsoft, without whom we
-would not be where we are today. Much of the knowledge contained within this
-manual was learned through reading code published by Microsoft on their `GitHub
-repo <https://github.com/Microsoft/microsoft-pdb>`__.
-
-.. _pdb_layout:
-
-File Layout
-===========
-
-.. important::
- Unless otherwise specified, all numeric values are encoded in little endian.
- If you see a type such as ``uint16_t`` or ``uint64_t`` going forward, always
- assume it is little endian!
-
-.. toctree::
- :hidden:
-
- MsfFile
- PdbStream
- TpiStream
- DbiStream
- ModiStream
- PublicStream
- GlobalStream
- HashTable
- CodeViewSymbols
- CodeViewTypes
-
-.. _msf:
-
-The MSF Container
------------------
-A PDB file is really just a special case of an MSF (Multi-Stream Format) file.
-An MSF file is actually a miniature "file system within a file". It contains
-multiple streams (aka files) which can represent arbitrary data, and these
-streams are divided into blocks which may not necessarily be contiguously
-laid out within the file (aka fragmented). Additionally, the MSF contains a
-stream directory (aka MFT) which describes how the streams (files) are laid
-out within the MSF.
-
-For more information about the MSF container format, stream directory, and
-block layout, see :doc:`MsfFile`.
-
-.. _streams:
-
-Streams
--------
-The PDB format contains a number of streams which describe various information
-such as the types, symbols, source files, and compilands (e.g. object files)
-of a program, as well as some additional streams containing hash tables that are
-used by debuggers and other tools to provide fast lookup of records and types
-by name, and various other information about how the program was compiled such
-as the specific toolchain used, and more. A summary of streams contained in a
-PDB file is as follows:
-
-+--------------------+------------------------------+-------------------------------------------+
-| Name | Stream Index | Contents |
-+====================+==============================+===========================================+
-| Old Directory | - Fixed Stream Index 0 | - Previous MSF Stream Directory |
-+--------------------+------------------------------+-------------------------------------------+
-| PDB Stream | - Fixed Stream Index 1 | - Basic File Information |
-| | | - Fields to match EXE to this PDB |
-| | | - Map of named streams to stream indices |
-+--------------------+------------------------------+-------------------------------------------+
-| TPI Stream | - Fixed Stream Index 2 | - CodeView Type Records |
-| | | - Index of TPI Hash Stream |
-+--------------------+------------------------------+-------------------------------------------+
-| DBI Stream | - Fixed Stream Index 3 | - Module/Compiland Information |
-| | | - Indices of individual module streams |
-| | | - Indices of public / global streams |
-| | | - Section Contribution Information |
-| | | - Source File Information |
-| | | - References to streams containing |
-| | | FPO / PGO Data |
-+--------------------+------------------------------+-------------------------------------------+
-| IPI Stream | - Fixed Stream Index 4 | - CodeView Type Records |
-| | | - Index of IPI Hash Stream |
-+--------------------+------------------------------+-------------------------------------------+
-| /LinkInfo | - Contained in PDB Stream | - Unknown |
-| | Named Stream map | |
-+--------------------+------------------------------+-------------------------------------------+
-| /src/headerblock | - Contained in PDB Stream | - Summary of embedded source file content |
-| | Named Stream map | (e.g. natvis files) |
-+--------------------+------------------------------+-------------------------------------------+
-| /names | - Contained in PDB Stream | - PDB-wide global string table used for |
-| | Named Stream map | string de-duplication |
-+--------------------+------------------------------+-------------------------------------------+
-| Module Info Stream | - Contained in DBI Stream | - CodeView Symbol Records for this module |
-| | - One for each compiland | - Line Number Information |
-+--------------------+------------------------------+-------------------------------------------+
-| Public Stream | - Contained in DBI Stream | - Public (Exported) Symbol Records |
-| | | - Index of Public Hash Stream |
-+--------------------+------------------------------+-------------------------------------------+
-| Global Stream | - Contained in DBI Stream | - Single combined master symbol-table |
-| | | - Index of Global Hash Stream |
-+--------------------+------------------------------+-------------------------------------------+
-| TPI Hash Stream | - Contained in TPI Stream | - Hash table for looking up TPI records |
-| | | by name |
-+--------------------+------------------------------+-------------------------------------------+
-| IPI Hash Stream | - Contained in IPI Stream | - Hash table for looking up IPI records |
-| | | by name |
-+--------------------+------------------------------+-------------------------------------------+
-
-More information about the structure of each of these can be found on the
-following pages:
-
-:doc:`PdbStream`
- Information about the PDB Info Stream and how it is used to match PDBs to EXEs.
-
-:doc:`TpiStream`
- Information about the TPI stream and the CodeView records contained within.
-
-:doc:`DbiStream`
- Information about the DBI stream and relevant substreams including the Module Substreams,
- source file information, and CodeView symbol records contained within.
-
-:doc:`ModiStream`
- Information about the Module Information Stream, of which there is one for each compilation
- unit and the format of symbols contained within.
-
-:doc:`PublicStream`
- Information about the Public Symbol Stream.
-
-:doc:`GlobalStream`
- Information about the Global Symbol Stream.
-
-:doc:`HashTable`
- Information about the serialized hash table format used internally to represent things such
- as the Named Stream Map and the Hash Adjusters in the :doc:`TPI/IPI Stream <TpiStream>`.
-
-CodeView
-========
-CodeView is another format which comes into the picture. While MSF defines
-the structure of the overall file, and PDB defines the set of streams that
-appear within the MSF file and the format of those streams, CodeView defines
-the format of **symbol and type records** that appear within specific streams.
-Refer to the pages on :doc:`CodeViewSymbols` and :doc:`CodeViewTypes` for
-more information about the CodeView format.
+=====================================
+The PDB File Format
+=====================================
+
+.. contents::
+ :local:
+
+.. _pdb_intro:
+
+Introduction
+============
+
+PDB (Program Database) is a file format invented by Microsoft and which contains
+debug information that can be consumed by debuggers and other tools. Since
+officially supported APIs exist on Windows for querying debug information from
+PDBs even without the user understanding the internals of the file format, a
+large ecosystem of tools has been built for Windows to consume this format. In
+order for Clang to be able to generate programs that can interoperate with these
+tools, it is necessary for us to generate PDB files ourselves.
+
+At the same time, LLVM has a long history of being able to cross-compile from
+any platform to any platform, and we wish for the same to be true here. So it
+is necessary for us to understand the PDB file format at the byte-level so that
+we can generate PDB files entirely on our own.
+
+This manual describes what we know about the PDB file format today. The layout
+of the file, the various streams contained within, the format of individual
+records within, and more.
+
+We would like to extend our heartfelt gratitude to Microsoft, without whom we
+would not be where we are today. Much of the knowledge contained within this
+manual was learned through reading code published by Microsoft on their `GitHub
+repo <https://github.com/Microsoft/microsoft-pdb>`__.
+
+.. _pdb_layout:
+
+File Layout
+===========
+
+.. important::
+ Unless otherwise specified, all numeric values are encoded in little endian.
+ If you see a type such as ``uint16_t`` or ``uint64_t`` going forward, always
+ assume it is little endian!
+
+.. toctree::
+ :hidden:
+
+ MsfFile
+ PdbStream
+ TpiStream
+ DbiStream
+ ModiStream
+ PublicStream
+ GlobalStream
+ HashTable
+ CodeViewSymbols
+ CodeViewTypes
+
+.. _msf:
+
+The MSF Container
+-----------------
+A PDB file is really just a special case of an MSF (Multi-Stream Format) file.
+An MSF file is actually a miniature "file system within a file". It contains
+multiple streams (aka files) which can represent arbitrary data, and these
+streams are divided into blocks which may not necessarily be contiguously
+laid out within the file (aka fragmented). Additionally, the MSF contains a
+stream directory (aka MFT) which describes how the streams (files) are laid
+out within the MSF.
+
+For more information about the MSF container format, stream directory, and
+block layout, see :doc:`MsfFile`.
+
+.. _streams:
+
+Streams
+-------
+The PDB format contains a number of streams which describe various information
+such as the types, symbols, source files, and compilands (e.g. object files)
+of a program, as well as some additional streams containing hash tables that are
+used by debuggers and other tools to provide fast lookup of records and types
+by name, and various other information about how the program was compiled such
+as the specific toolchain used, and more. A summary of streams contained in a
+PDB file is as follows:
+
++--------------------+------------------------------+-------------------------------------------+
+| Name | Stream Index | Contents |
++====================+==============================+===========================================+
+| Old Directory | - Fixed Stream Index 0 | - Previous MSF Stream Directory |
++--------------------+------------------------------+-------------------------------------------+
+| PDB Stream | - Fixed Stream Index 1 | - Basic File Information |
+| | | - Fields to match EXE to this PDB |
+| | | - Map of named streams to stream indices |
++--------------------+------------------------------+-------------------------------------------+
+| TPI Stream | - Fixed Stream Index 2 | - CodeView Type Records |
+| | | - Index of TPI Hash Stream |
++--------------------+------------------------------+-------------------------------------------+
+| DBI Stream | - Fixed Stream Index 3 | - Module/Compiland Information |
+| | | - Indices of individual module streams |
+| | | - Indices of public / global streams |
+| | | - Section Contribution Information |
+| | | - Source File Information |
+| | | - References to streams containing |
+| | | FPO / PGO Data |
++--------------------+------------------------------+-------------------------------------------+
+| IPI Stream | - Fixed Stream Index 4 | - CodeView Type Records |
+| | | - Index of IPI Hash Stream |
++--------------------+------------------------------+-------------------------------------------+
+| /LinkInfo | - Contained in PDB Stream | - Unknown |
+| | Named Stream map | |
++--------------------+------------------------------+-------------------------------------------+
+| /src/headerblock | - Contained in PDB Stream | - Summary of embedded source file content |
+| | Named Stream map | (e.g. natvis files) |
++--------------------+------------------------------+-------------------------------------------+
+| /names | - Contained in PDB Stream | - PDB-wide global string table used for |
+| | Named Stream map | string de-duplication |
++--------------------+------------------------------+-------------------------------------------+
+| Module Info Stream | - Contained in DBI Stream | - CodeView Symbol Records for this module |
+| | - One for each compiland | - Line Number Information |
++--------------------+------------------------------+-------------------------------------------+
+| Public Stream | - Contained in DBI Stream | - Public (Exported) Symbol Records |
+| | | - Index of Public Hash Stream |
++--------------------+------------------------------+-------------------------------------------+
+| Global Stream | - Contained in DBI Stream | - Single combined master symbol-table |
+| | | - Index of Global Hash Stream |
++--------------------+------------------------------+-------------------------------------------+
+| TPI Hash Stream | - Contained in TPI Stream | - Hash table for looking up TPI records |
+| | | by name |
++--------------------+------------------------------+-------------------------------------------+
+| IPI Hash Stream | - Contained in IPI Stream | - Hash table for looking up IPI records |
+| | | by name |
++--------------------+------------------------------+-------------------------------------------+
+
+More information about the structure of each of these can be found on the
+following pages:
+
+:doc:`PdbStream`
+ Information about the PDB Info Stream and how it is used to match PDBs to EXEs.
+
+:doc:`TpiStream`
+ Information about the TPI stream and the CodeView records contained within.
+
+:doc:`DbiStream`
+ Information about the DBI stream and relevant substreams including the Module Substreams,
+ source file information, and CodeView symbol records contained within.
+
+:doc:`ModiStream`
+ Information about the Module Information Stream, of which there is one for each compilation
+ unit and the format of symbols contained within.
+
+:doc:`PublicStream`
+ Information about the Public Symbol Stream.
+
+:doc:`GlobalStream`
+ Information about the Global Symbol Stream.
+
+:doc:`HashTable`
+ Information about the serialized hash table format used internally to represent things such
+ as the Named Stream Map and the Hash Adjusters in the :doc:`TPI/IPI Stream <TpiStream>`.
+
+CodeView
+========
+CodeView is another format which comes into the picture. While MSF defines
+the structure of the overall file, and PDB defines the set of streams that
+appear within the MSF file and the format of those streams, CodeView defines
+the format of **symbol and type records** that appear within specific streams.
+Refer to the pages on :doc:`CodeViewSymbols` and :doc:`CodeViewTypes` for
+more information about the CodeView format.
More information about the llvm-commits
mailing list