[llvm] r357777 - Add documentation for PDB TPI/IPI Stream.
Zachary Turner via llvm-commits
llvm-commits at lists.llvm.org
Fri Apr 5 09:43:43 PDT 2019
Author: zturner
Date: Fri Apr 5 09:43:42 2019
New Revision: 357777
URL: http://llvm.org/viewvc/llvm-project?rev=357777&view=rev
Log:
Add documentation for PDB TPI/IPI Stream.
Modified:
llvm/trunk/docs/PDB/TpiStream.rst
llvm/trunk/docs/PDB/index.rst
Modified: llvm/trunk/docs/PDB/TpiStream.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/PDB/TpiStream.rst?rev=357777&r1=357776&r2=357777&view=diff
==============================================================================
--- llvm/trunk/docs/PDB/TpiStream.rst (original)
+++ llvm/trunk/docs/PDB/TpiStream.rst Fri Apr 5 09:43:42 2019
@@ -1,3 +1,304 @@
=====================================
-The PDB TPI Stream
+The PDB TPI and IPI Streams
=====================================
+
+.. contents::
+ :local:
+
+.. _tpi_intro:
+
+Introduction
+============
+
+The PDB TPI Stream (Index 2) and IPI Stream (Index 3) contain information about
+all types used in the program. It is organized as a :ref:`header <tpi_header>`
+followed by a list of :doc:`CodeView Type Records <CodeViewTypes>`. Types are
+referenced from various streams and records throughout the PDB by their
+:ref:`type index <type_indices>`. In general, the sequence of type records
+following the :ref:`header <tpi_header>` forms a topologically sorted DAG
+(directed acyclic graph), which means that a type record B can only refer to
+the type A if ``A.TypeIndex < B.TypeIndex``. While there are rare cases where
+this property will not hold (particularly when dealing with object files
+compiled with MASM), an implementation should try very hard to make this
+property hold, as it means the entire type graph can be constructed in a single
+pass.
+
+.. important::
+ Type records form a topologically sorted DAG (directed acyclic graph).
+
+.. _tpi_ipi:
+
+TPI vs IPI Stream
+=================
+
+Recent versions of the PDB format (aka all versions covered by this document)
+have 2 streams with identical layout, henceforth referred to as the TPI stream
+and IPI stream. Subsequent contents of this document describing the on-disk
+format apply equally whether it is for the TPI Stream or the IPI Stream. The
+only difference between the two is in *which* CodeView records are allowed to
+appear in each one, summarized by the following table:
+
++----------------------+---------------------+
+| TPI Stream | IPI Stream |
++======================+=====================+
+| LF_POINTER | LF_FUNC_ID |
++----------------------+---------------------+
+| LF_MODIFIER | LF_MFUNC_ID |
++----------------------+---------------------+
+| LF_PROCEDURE | LF_BUILDINFO |
++----------------------+---------------------+
+| LF_MFUNCTION | LF_SUBSTR_LIST |
++----------------------+---------------------+
+| LF_LABEL | LF_STRING_ID |
++----------------------+---------------------+
+| LF_ARGLIST | LF_UDT_SRC_LINE |
++----------------------+---------------------+
+| LF_FIELDLIST | LF_UDT_MOD_SRC_LINE |
++----------------------+---------------------+
+| LF_ARRAY | |
++----------------------+---------------------+
+| LF_CLASS | |
++----------------------+---------------------+
+| LF_STRUCTURE | |
++----------------------+---------------------+
+| LF_INTERFACE | |
++----------------------+---------------------+
+| LF_UNION | |
++----------------------+---------------------+
+| LF_ENUM | |
++----------------------+---------------------+
+| LF_TYPESERVER2 | |
++----------------------+---------------------+
+| LF_VFTABLE | |
++----------------------+---------------------+
+| LF_VTSHAPE | |
++----------------------+---------------------+
+| LF_BITFIELD | |
++----------------------+---------------------+
+| LF_METHODLIST | |
++----------------------+---------------------+
+| LF_PRECOMP | |
++----------------------+---------------------+
+| LF_ENDPRECOMP | |
++----------------------+---------------------+
+
+The usage of these records is described in more detail in
+:doc:`CodeView Type Records <CodeViewTypes>`.
+
+.. _type_indices:
+
+Type Indices
+============
+
+A type index is a 32-bit integer that uniquely identifies a type inside of an
+object file's ``.debug$T`` section or a PDB file's TPI or IPI stream. The
+value of the type index for the first type record from the TPI stream is given
+by the ``TypeIndexBegin`` member of the :ref:`TPI Stream Header <tpi_header>`
+although in practice this value is always equal to 0x1000 (4096).
+
+Any type index with a high bit set is considered to come from the IPI stream,
+although this appears to be more of a hack, and LLVM does not generate type
+indices of this nature. They can, however, be observed in Microsoft PDBs
+occasionally, so one should be prepared to handle them. Note that having the
+high bit set is not a necessary condition to determine whether a type index
+comes from the IPI stream, it is only sufficient.
+
+Once the high bit is cleared, any type index >= ``TypeIndexBegin`` is presumed
+to come from the appropriate stream, and any type index less than this is a
+bitmask which can be decomposed as follows:
+
+.. code-block:: none
+
+ .---------------------------.------.----------.
+ | Unused | Mode | Kind |
+ '---------------------------'------'----------'
+ |+32 |+12 |+8 |+0
+
+
+- **Kind** - A value from the following enum:
+
+.. code-block:: c++
+
+ enum class SimpleTypeKind : uint32_t {
+ None = 0x0000, // uncharacterized type (no type)
+ Void = 0x0003, // void
+ NotTranslated = 0x0007, // type not translated by cvpack
+ HResult = 0x0008, // OLE/COM HRESULT
+
+ SignedCharacter = 0x0010, // 8 bit signed
+ UnsignedCharacter = 0x0020, // 8 bit unsigned
+ NarrowCharacter = 0x0070, // really a char
+ WideCharacter = 0x0071, // wide char
+ Character16 = 0x007a, // char16_t
+ Character32 = 0x007b, // char32_t
+
+ SByte = 0x0068, // 8 bit signed int
+ Byte = 0x0069, // 8 bit unsigned int
+ Int16Short = 0x0011, // 16 bit signed
+ UInt16Short = 0x0021, // 16 bit unsigned
+ Int16 = 0x0072, // 16 bit signed int
+ UInt16 = 0x0073, // 16 bit unsigned int
+ Int32Long = 0x0012, // 32 bit signed
+ UInt32Long = 0x0022, // 32 bit unsigned
+ Int32 = 0x0074, // 32 bit signed int
+ UInt32 = 0x0075, // 32 bit unsigned int
+ Int64Quad = 0x0013, // 64 bit signed
+ UInt64Quad = 0x0023, // 64 bit unsigned
+ Int64 = 0x0076, // 64 bit signed int
+ UInt64 = 0x0077, // 64 bit unsigned int
+ Int128Oct = 0x0014, // 128 bit signed int
+ UInt128Oct = 0x0024, // 128 bit unsigned int
+ Int128 = 0x0078, // 128 bit signed int
+ UInt128 = 0x0079, // 128 bit unsigned int
+
+ Float16 = 0x0046, // 16 bit real
+ Float32 = 0x0040, // 32 bit real
+ Float32PartialPrecision = 0x0045, // 32 bit PP real
+ Float48 = 0x0044, // 48 bit real
+ Float64 = 0x0041, // 64 bit real
+ Float80 = 0x0042, // 80 bit real
+ Float128 = 0x0043, // 128 bit real
+
+ Complex16 = 0x0056, // 16 bit complex
+ Complex32 = 0x0050, // 32 bit complex
+ Complex32PartialPrecision = 0x0055, // 32 bit PP complex
+ Complex48 = 0x0054, // 48 bit complex
+ Complex64 = 0x0051, // 64 bit complex
+ Complex80 = 0x0052, // 80 bit complex
+ Complex128 = 0x0053, // 128 bit complex
+
+ Boolean8 = 0x0030, // 8 bit boolean
+ Boolean16 = 0x0031, // 16 bit boolean
+ Boolean32 = 0x0032, // 32 bit boolean
+ Boolean64 = 0x0033, // 64 bit boolean
+ Boolean128 = 0x0034, // 128 bit boolean
+ };
+
+- **Mode** - A value from the following enum:
+
+.. code-block:: c++
+
+ enum class SimpleTypeMode : uint32_t {
+ Direct = 0, // Not a pointer
+ NearPointer = 1, // Near pointer
+ FarPointer = 2, // Far pointer
+ HugePointer = 3, // Huge pointer
+ NearPointer32 = 4, // 32 bit near pointer
+ FarPointer32 = 5, // 32 bit far pointer
+ NearPointer64 = 6, // 64 bit near pointer
+ NearPointer128 = 7 // 128 bit near pointer
+ };
+
+Note that for pointers, the bitness is represented in the mode. So a ``void*``
+would have a type index with ``Mode=NearPointer32, Kind=Void`` if built for 32-bits
+but a type index with ``Mode=NearPointer64, Kind=Void`` if built for 64-bits.
+
+By convention, the type index for ``std::nullptr_t`` is constructed the same way
+as the type index for ``void*``, but using the bitless enumeration value
+``NearPointer``.
+
+
+
+.. _tpi_header:
+
+Stream Header
+=============
+At offset 0 of the TPI Stream is a header with the following layout:
+
+
+.. code-block:: c++
+
+ struct TpiStreamHeader {
+ uint32_t Version;
+ uint32_t HeaderSize;
+ uint32_t TypeIndexBegin;
+ uint32_t TypeIndexEnd;
+ uint32_t TypeRecordBytes;
+
+ uint16_t HashStreamIndex;
+ uint16_t HashAuxStreamIndex;
+ uint32_t HashKeySize;
+ uint32_t NumHashBuckets;
+
+ int32_t HashValueBufferOffset;
+ uint32_t HashValueBufferLength;
+
+ int32_t IndexOffsetBufferOffset;
+ uint32_t IndexOffsetBufferLength;
+
+ int32_t HashAdjBufferOffset;
+ uint32_t HashAdjBufferLength;
+ };
+
+- **Version** - A value from the following enum.
+
+.. code-block:: c++
+
+ enum class TpiStreamVersion : uint32_t {
+ V40 = 19950410,
+ V41 = 19951122,
+ V50 = 19961031,
+ V70 = 19990903,
+ V80 = 20040203,
+ };
+
+Similar to the :doc:`PDB Stream <PdbStream>`, this value always appears to be
+``V80``, and no other values have been observed. It is assumed that should
+another value be observed, the layout described by this document may not be
+accurate.
+
+- **HeaderSize** - ``sizeof(TpiStreamHeader)``
+
+- **TypeIndexBegin** - The numeric value of the type index representing the
+ first type record in the TPI stream. This is usually the value 0x1000 as type
+ indices lower than this are reserved (see :ref:`Type Indices <type_indices>` for
+ a discussion of reserved type indices).
+
+- **TypeIndexEnd** - One greater than the numeric value of the type index
+ representing the last type record in the TPI stream. The total number of type
+ records in the TPI stream can be computed as ``TypeIndexEnd - TypeIndexBegin``.
+
+- **TypeRecordBytes** - The number of bytes of type record data following the header.
+
+- **HashStreamIndex** - The index of a stream which contains a list of hashes for
+ every type record. This value may be -1, indicating that hash information is not
+ present. In practice a valid stream index is always observed, so any producer
+ implementation should be prepared to emit this stream to ensure compatibility with
+ tools which may expect it to be present.
+
+- **HashAuxStreamIndex** - Presumably the index of a stream which contains a separate
+ hash table, although this has not been observed in practice and it's unclear what it
+ might be used for.
+
+- **HashKeySize** - The size of a hash value (usually 4 bytes).
+
+- **NumHashBuckets** - The number of buckets used to generate the hash values in the
+ aforementioned hash streams.
+
+- **HashValueBufferOffset / HashValueBufferLength** - The offset and size within
+ the TPI Hash Stream of the list of hash values. It should be assumed that there
+ are either 0 hash values, or a number equal to the number of type records in the
+ TPI stream (``TypeIndexEnd - TypeEndBegin``). Thus, if ``HashBufferLength`` is
+ not equal to ``(TypeIndexEnd - TypeEndBegin) * HashKeySize`` we can consider the
+ PDB malformed.
+
+- **IndexOffsetBufferOffset / IndexOffsetBufferLength** - The offset and size
+ within the TPI Hash Stream of the Type Index Offsets Buffer. This is a list of
+ pairs of uint32_t's where the first value is a :ref:`Type Index <type_indices>`
+ and the second value is the offset in the type record data of the type with this
+ index. This can be used to do a binary search followed bin a linear search to
+ get amortized O(log n) lookup by type index.
+
+- **HashAdjBufferOffset / HashAdjBufferLength** -
+
+.. _tpi_records:
+
+CodeView Type Record List
+=========================
+Following the header, there are ``TypeRecordBytes`` bytes of data that represent a
+variable length array of :doc:`CodeView type records <CodeViewTypes>`. The number
+of such records (e.g. the length of the array) can be determined by computing the
+value ``Header.TypeIndexEnd - Header.TypeIndexBegin``.
+
+log(n) random access is provided by way of the Type Index Offsets array (if present)
+described previously.
\ No newline at end of file
Modified: llvm/trunk/docs/PDB/index.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/PDB/index.rst?rev=357777&r1=357776&r2=357777&view=diff
==============================================================================
--- llvm/trunk/docs/PDB/index.rst (original)
+++ llvm/trunk/docs/PDB/index.rst Fri Apr 5 09:43:42 2019
@@ -100,7 +100,8 @@ PDB file is as follows:
| | | - Indices of public / global streams |
| | | - Section Contribution Information |
| | | - Source File Information |
-| | | - FPO / PGO Data |
+| | | - References to streams containing |
+| | | FPO / PGO Data |
+--------------------+------------------------------+-------------------------------------------+
| IPI Stream | - Fixed Stream Index 4 | - CodeView Type Records |
| | | - Index of IPI Hash Stream |
@@ -108,8 +109,8 @@ PDB file is as follows:
| /LinkInfo | - Contained in PDB Stream | - Unknown |
| | Named Stream map | |
+--------------------+------------------------------+-------------------------------------------+
-| /src/headerblock | - Contained in PDB Stream | - Unknown |
-| | Named Stream map | |
+| /src/headerblock | - Contained in PDB Stream | - Summary of embedded source file content |
+| | Named Stream map | (e.g. natvis files) |
+--------------------+------------------------------+-------------------------------------------+
| /names | - Contained in PDB Stream | - PDB-wide global string table used for |
| | Named Stream map | string de-duplication |
@@ -120,7 +121,7 @@ PDB file is as follows:
| Public Stream | - Contained in DBI Stream | - Public (Exported) Symbol Records |
| | | - Index of Public Hash Stream |
+--------------------+------------------------------+-------------------------------------------+
-| Global Stream | - Contained in DBI Stream | - Global Symbol Records |
+| Global Stream | - Contained in DBI Stream | - Single combined master symbol-table |
| | | - Index of Global Hash Stream |
+--------------------+------------------------------+-------------------------------------------+
| TPI Hash Stream | - Contained in TPI Stream | - Hash table for looking up TPI records |
@@ -129,6 +130,10 @@ PDB file is as follows:
| IPI Hash Stream | - Contained in IPI Stream | - Hash table for looking up IPI records |
| | | by name |
+--------------------+------------------------------+-------------------------------------------+
+| * LINKER* Stream | - Last Stream in PDB File | - Executable section information |
+| | | - Incremental linking thunks |
+| | | - Linker version information |
++--------------------+------------------------------+-------------------------------------------+
More information about the structure of each of these can be found on the
following pages:
More information about the llvm-commits
mailing list