[cfe-commits] r163989 - /cfe/trunk/docs/PCHInternals.html

Douglas Gregor dgregor at apple.com
Sat Sep 15 18:44:02 PDT 2012


Author: dgregor
Date: Sat Sep 15 20:44:02 2012
New Revision: 163989

URL: http://llvm.org/viewvc/llvm-project?rev=163989&view=rev
Log:
Update the PCH internals documentation to cover chained precompiled
headers and modules in more detail. I'd still like to expand on some
of the modules-related issues further, but this is a decent start.

Modified:
    cfe/trunk/docs/PCHInternals.html

Modified: cfe/trunk/docs/PCHInternals.html
URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/docs/PCHInternals.html?rev=163989&r1=163988&r2=163989&view=diff
==============================================================================
--- cfe/trunk/docs/PCHInternals.html (original)
+++ cfe/trunk/docs/PCHInternals.html Sat Sep 15 20:44:02 2012
@@ -2,7 +2,7 @@
           "http://www.w3.org/TR/html4/strict.dtd">
 <html>
 <head>
-  <title>Precompiled Headers (PCH)</title>
+  <title>Precompiled Header and Modules Internals</title>
   <link type="text/css" rel="stylesheet" href="../menu.css">
   <link type="text/css" rel="stylesheet" href="../content.css">
   <style type="text/css">
@@ -18,10 +18,10 @@
 
 <div id="content">
 
-<h1>Precompiled Headers</h1>
+<h1>Precompiled Header and Modules Internals</h1>
 
   <p>This document describes the design and implementation of Clang's
-  precompiled headers (PCH). If you are interested in the end-user
+  precompiled headers (PCH) and modules. If you are interested in the end-user
   view, please see the <a
    href="UsersManual.html#precompiledheaders">User's Manual</a>.</p>
 
@@ -30,7 +30,7 @@
     <li><a href="#usage">Using Precompiled Headers with
     <tt>clang</tt></a></li>
     <li><a href="#philosophy">Design Philosophy</a></li>
-    <li><a href="#contents">Precompiled Header Contents</a>
+    <li><a href="#contents">Serialized AST File Contents</a>
       <ul>
         <li><a href="#metadata">Metadata Block</a></li>
         <li><a href="#sourcemgr">Source Manager Block</a></li>
@@ -42,8 +42,9 @@
         <li><a href="#method-pool">Method Pool Block</a></li>
       </ul>
     </li>
-    <li><a href="#tendrils">Precompiled Header Integration
-    Points</a></li>
+    <li><a href="#tendrils">AST Reader Integration Points</a></li>
+    <li><a href="#chained">Chained precompiled headers</a></li>
+    <li><a href="#modules">Modules</a></li>
 </ul>
     
 <h2 id="usage">Using Precompiled Headers with <tt>clang</tt></h2>
@@ -94,30 +95,39 @@
   require the PCH file to be up-to-date.</li>
 </ul>
 
-<p>Clang's precompiled headers are designed with a compact on-disk
-representation, which minimizes both PCH creation time and the time
-required to initially load the PCH file. The PCH file itself contains
+<p>Modules, as implemented in Clang, use the same mechanisms as
+precompiled headers to save a serialized AST file (one per module) and
+use those AST modules. From an implementation standpoint, modules are
+a generalization of precompiled headers, lifting a number of
+restrictions placed on precompiled headers. In particular, there can
+only be one precompiled header and it must be included at the
+beginning of the translation unit. The extensions to the AST file
+format required for modules are discussed in the section on <a href="#modules">modules</a>.</p>
+
+<p>Clang's AST files are designed with a compact on-disk
+representation, which minimizes both creation time and the time
+required to initially load the AST file. The AST file itself contains
 a serialized representation of Clang's abstract syntax trees and
 supporting data structures, stored using the same compressed bitstream
 as <a href="http://llvm.org/docs/BitCodeFormat.html">LLVM's bitcode
 file format</a>.</p>
 
-<p>Clang's precompiled headers are loaded "lazily" from disk. When a
-PCH file is initially loaded, Clang reads only a small amount of data
-from the PCH file to establish where certain important data structures
+<p>Clang's AST files are loaded "lazily" from disk. When an
+AST file is initially loaded, Clang reads only a small amount of data
+from the AST file to establish where certain important data structures
 are stored. The amount of data read in this initial load is
-independent of the size of the PCH file, such that a larger PCH file
-does not lead to longer PCH load times. The actual header data in the
-PCH file--macros, functions, variables, types, etc.--is loaded only
+independent of the size of the AST file, such that a larger AST file
+does not lead to longer AST load times. The actual header data in the
+AST file--macros, functions, variables, types, etc.--is loaded only
 when it is referenced from the user's code, at which point only that
 entity (and those entities it depends on) are deserialized from the
-PCH file. With this approach, the cost of using a precompiled header
+AST file. With this approach, the cost of using an AST file
 for a translation unit is proportional to the amount of code actually
-used from the header, rather than being proportional to the size of
-the header itself.</p> 
+used from the AST file, rather than being proportional to the size of
+the AST file itself.</p> 
 
 <p>When given the <code>-print-stats</code> option, Clang produces
-statistics describing how much of the precompiled header was actually
+statistics describing how much of the AST file was actually
 loaded from disk. For a simple "Hello, World!" program that includes
 the Apple <code>Cocoa.h</code> header (which is built as a precompiled
 header), this option illustrates how little of the actual precompiled
@@ -143,7 +153,7 @@
 <p>For this small program, only a tiny fraction of the source
 locations, types, declarations, identifiers, and macros were actually
 deserialized from the precompiled header. These statistics can be
-useful to determine whether the precompiled header implementation can
+useful to determine whether the AST file implementation can
 be improved by making more of the implementation lazy.</p>
 
 <p>Precompiled headers can be chained. When you create a PCH while
@@ -153,13 +163,15 @@
 commonly used throughout your project, and then create a PCH for every
 single source file in the project that includes the code that is
 specific to that file, so that recompiling the file itself is very fast,
-without duplicating the data from the common headers for every file.</p>
+without duplicating the data from the common headers for every
+file. The mechanisms behind chained precompiled headers are discussed
+in a <a href="#chained">later section</a>.
 
-<h2 id="contents">Precompiled Header Contents</h2>
+<h2 id="contents">AST File Contents</h2>
 
 <img src="PCHLayout.png" style="float:right" alt="Precompiled header layout">
 
-<p>Clang's precompiled headers are organized into several different
+<p>Clang's AST files are organized into several different
 blocks, each of which contains the serialized representation of a part
 of Clang's internal representation. Each of the blocks corresponds to
 either a block or a record within <a
@@ -167,19 +179,19 @@
 format</a>. The contents of each of these logical blocks are described
 below.</p>
 
-<p>For a given precompiled header, the <a
+<p>For a given AST file, the <a
 href="http://llvm.org/cmds/llvm-bcanalyzer.html"><code>llvm-bcanalyzer</code></a>
 utility can be used to examine the actual structure of the bitstream
-for the precompiled header. This information can be used both to help
-understand the structure of the precompiled header and to isolate
-areas where precompiled headers can still be optimized, e.g., through
+for the AST file. This information can be used both to help
+understand the structure of the AST file and to isolate
+areas where AST files can still be optimized, e.g., through
 the introduction of abbreviations.</p>
 
 <h3 id="metadata">Metadata Block</h3>
 
 <p>The metadata block contains several records that provide
-information about how the precompiled header was built. This metadata
-is primarily used to validate the use of a precompiled header. For
+information about how the AST file was built. This metadata
+is primarily used to validate the use of an AST file. For
 example, a precompiled header built for a 32-bit x86 target cannot be used
 when compiling for a 64-bit x86 target. The metadata block contains
 information about:</p>
@@ -187,17 +199,17 @@
 <dl>
   <dt>Language options</dt>
   <dd>Describes the particular language dialect used to compile the
-PCH file, including major options (e.g., Objective-C support) and more
+AST file, including major options (e.g., Objective-C support) and more
 minor options (e.g., support for "//" comments). The contents of this
 record correspond to the <code>LangOptions</code> class.</dd>
   
   <dt>Target architecture</dt>
   <dd>The target triple that describes the architecture, platform, and
-ABI for which the PCH file was generated, e.g.,
+ABI for which the AST file was generated, e.g.,
 <code>i386-apple-darwin9</code>.</dd>
   
-  <dt>PCH version</dt>
-  <dd>The major and minor version numbers of the precompiled header
+  <dt>AST version</dt>
+  <dd>The major and minor version numbers of the AST file
 format. Changes in the minor version number should not affect backward
 compatibility, while changes in the major version number imply that a
 newer compiler cannot read an older precompiled header (and
@@ -205,11 +217,11 @@
 
   <dt>Original file name</dt>
   <dd>The full path of the header that was used to generate the
-precompiled header.</dd>
+AST file.</dd>
 
   <dt>Predefines buffer</dt>
   <dd>Although not explicitly stored as part of the metadata, the
-predefines buffer is used in the validation of the precompiled header.
+predefines buffer is used in the validation of the AST file.
 The predefines buffer itself contains code generated by the compiler
 to initialize the preprocessor state according to the current target,
 platform, and command-line options. For example, the predefines buffer
@@ -220,26 +232,14 @@
 
 </dl>
 
-<p>A chained PCH file (that is, one that references another PCH) has
-a slightly different metadata block, which contains the following
-information:</p>
-
-<dl>
-  <dt>Referenced file</dt>
-  <dd>The name of the referenced PCH file. It is looked up like a file
-specified using -include-pch.</dd>
-
-  <dt>PCH version</dt>
-  <dd>This is the same as in normal PCH files.</dd>
-
-  <dt>Original file name</dt>
-  <dd>The full path of the header that was used to generate this
-precompiled header.</dd>
-
-</dl>
-
-<p>The language options, target architecture and predefines buffer data
-is taken from the end of the chain, since they have to match anyway.</p>
+<p>A chained PCH file (that is, one that references another PCH) and a
+module (which may import other modules) have additional metadata
+containing the list of all AST files that this AST file depends
+on. Each of those files will be loaded along with this AST file.</p>
+
+<p>For chained precompiled headers, the language options, target
+architecture and predefines buffer data is taken from the end of the
+chain, since they have to match anyway.</p>
 
 <h3 id="sourcemgr">Source Manager Block</h3>
 
@@ -248,10 +248,10 @@
  href="InternalsManual.html#SourceLocation">SourceManager</a> class,
 which handles the mapping from source locations (as represented in
 Clang's abstract syntax tree) into actual column/line positions within
-a source file or macro instantiation. The precompiled header's
+a source file or macro instantiation. The AST file's
 representation of the source manager also includes information about
 all of the headers that were (transitively) included when building the
-precompiled header.</p>
+AST file.</p>
 
 <p>The bulk of the source manager block is dedicated to information
 about the various files, buffers, and macro instantiations into which
@@ -259,18 +259,18 @@
 "file ID", which is a unique number (allocated starting at 1) stored
 in the source location. Clang serializes the information for each kind
 of file ID, along with an index that maps file IDs to the position
-within the PCH file where the information about that file ID is
+within the AST file where the information about that file ID is
 stored. The data associated with a file ID is loaded only when
 required by the front end, e.g., to emit a diagnostic that includes a
 macro instantiation history inside the header itself.</p>
 
 <p>The source manager block also contains information about all of the
-headers that were included when building the precompiled header. This
+headers that were included when building the AST file. This
 includes information about the controlling macro for the header (e.g.,
 when the preprocessor identified that the contents of the header
 dependent on a macro like <code>LLVM_CLANG_SOURCEMANAGER_H</code>)
 along with a cached version of the results of the <code>stat()</code>
-system calls performed when building the precompiled header. The
+system calls performed when building the AST file. The
 latter is particularly useful in reducing system time when searching
 for include files.</p>
 
@@ -279,8 +279,8 @@
 <p>The preprocessor block contains the serialized representation of
 the preprocessor. Specifically, it contains all of the macros that
 have been defined by the end of the header used to build the
-precompiled header, along with the token sequences that comprise each
-macro. The macro definitions are only read from the PCH file when the
+AST file, along with the token sequences that comprise each
+macro. The macro definitions are only read from the AST file when the
 name of the macro first occurs in the program. This lazy loading of
 macro definitions is triggered by lookups into the <a
  href="#idtable">identifier table</a>.</p>
@@ -290,8 +290,8 @@
 <p>The types block contains the serialized representation of all of
 the types referenced in the translation unit. Each Clang type node
 (<code>PointerType</code>, <code>FunctionProtoType</code>, etc.) has a
-corresponding record type in the PCH file. When types are deserialized
-from the precompiled header, the data within the record is used to
+corresponding record type in the AST file. When types are deserialized
+from the AST file, the data within the record is used to
 reconstruct the appropriate type node using the AST context.</p>
 
 <p>Each type has a unique type ID, which is an integer that uniquely
@@ -300,10 +300,10 @@
 (<code>void</code>, <code>float</code>, etc.), while other
 "user-defined" type IDs are assigned consecutively from
 <code>NUM_PREDEF_TYPE_IDS</code> upward as the types are encountered.
-The PCH file has an associated mapping from the user-defined types
+The AST file has an associated mapping from the user-defined types
 block to the location within the types block where the serialized
 representation of that type resides, enabling lazy deserialization of
-types. When a type is referenced from within the PCH file, that
+types. When a type is referenced from within the AST file, that
 reference is encoded using the type ID shifted left by 3 bits. The
 lower three bits are used to represent the <code>const</code>,
 <code>volatile</code>, and <code>restrict</code> qualifiers, as in
@@ -316,19 +316,20 @@
 <p>The declarations block contains the serialized representation of
 all of the declarations referenced in the translation unit. Each Clang
 declaration node (<code>VarDecl</code>, <code>FunctionDecl</code>,
-etc.) has a corresponding record type in the PCH file. When
-declarations are deserialized from the precompiled header, the data
+etc.) has a corresponding record type in the AST file. When
+declarations are deserialized from the AST file, the data
 within the record is used to build and populate a new instance of the
 corresponding <code>Decl</code> node. As with types, each declaration
 node has a numeric ID that is used to refer to that declaration within
-the PCH file. In addition, a lookup table provides a mapping from that
+the AST file. In addition, a lookup table provides a mapping from that
 numeric ID to the offset within the precompiled header where that
 declaration is described.</p>
 
 <p>Declarations in Clang's abstract syntax trees are stored
 hierarchically. At the top of the hierarchy is the translation unit
 (<code>TranslationUnitDecl</code>), which contains all of the
-declarations in the translation unit. These declarations (such as
+declarations in the translation unit but is not actually written as a
+specific declaration node. Its child declarations (such as
 functions or struct types) may also contain other declarations inside
 them, and so on. Within Clang, each declaration is stored within a <a
 href="http://clang.llvm.org/docs/InternalsManual.html#DeclContext">declaration
@@ -339,7 +340,7 @@
 context (e.g., iterate over all of the fields of a structure for
 structure layout).</p>
 
-<p>In Clang's precompiled header format, deserializing a declaration
+<p>In Clang's AST file format, deserializing a declaration
 that is a <code>DeclContext</code> is a separate operation from
 deserializing all of the declarations stored within that declaration
 context. Therefore, Clang will deserialize the translation unit
@@ -354,14 +355,11 @@
   <code>x</code> within a given declaration context (for example,
   during semantic analysis of the expression <code>p->x</code>,
   where <code>p</code>'s type is defined in the precompiled header),
-  Clang deserializes a hash table mapping from the names within that
-  declaration context to the declaration IDs that represent each
-  visible declaration with that name. The entire hash table is
-  deserialized at this point (into the <code>llvm::DenseMap</code>
-  stored within each <code>DeclContext</code> object), but the actual
-  declarations are not yet deserialized. In a second step, those
-  declarations with the name <code>x</code> will be deserialized and
-  will be used as the result of name lookup.</li>
+  Clang refers to an on-disk hash table that maps from the names
+  within that declaration context to the declaration IDs that
+  represent each visible declaration with that name. The actual
+  declarations will then be deserialized to provide the results of
+  name lookup.</li>
 
   <li>When the front end performs iteration over all of the
   declarations within a declaration context, all of those declarations
@@ -376,7 +374,7 @@
 
 <h3 id="stmt">Statements and Expressions</h3>
 
-<p>Statements and expressions are stored in the precompiled header in
+<p>Statements and expressions are stored in the AST file in
 both the <a href="#types">types</a> and the <a
  href="#decls">declarations</a> blocks, because every statement or
 expression will be associated with either a type or declaration. The
@@ -389,10 +387,10 @@
 <p>As with types and declarations, each statement and expression kind
 in Clang's abstract syntax tree (<code>ForStmt</code>,
 <code>CallExpr</code>, etc.) has a corresponding record type in the
-precompiled header, which contains the serialized representation of
+AST file, which contains the serialized representation of
 that statement or expression. Each substatement or subexpression
 within an expression is stored as a separate record (which keeps most
-records to a fixed size). Within the precompiled header, the
+records to a fixed size). Within the AST file, the
 subexpressions of an expression are stored, in reverse order, prior to the expression
 that owns those expression, using a form of <a
 href="http://en.wikipedia.org/wiki/Reverse_Polish_notation">Reverse
@@ -420,7 +418,7 @@
 <h3 id="idtable">Identifier Table Block</h3>
 
 <p>The identifier table block contains an on-disk hash table that maps
-each identifier mentioned within the precompiled header to the
+each identifier mentioned within the AST file to the
 serialized representation of the identifier's information (e.g, the
 <code>IdentifierInfo</code> structure). The serialized representation
 contains:</p>
@@ -438,17 +436,20 @@
   declarations.</li>
 </ul>
 
-<p>When a precompiled header is loaded, the precompiled header
+<p>When an AST file is loaded, the AST file reader
 mechanism introduces itself into the identifier table as an external
 lookup source. Thus, when the user program refers to an identifier
 that has not yet been seen, Clang will perform a lookup into the
 identifier table. If an identifier is found, its contents (macro 
-definitions, flags, top-level declarations, etc.) will be deserialized, at which point the corresponding <code>IdentifierInfo</code> structure will have the same contents it would have after parsing the headers in the precompiled header.</p>
+definitions, flags, top-level declarations, etc.) will be
+deserialized, at which point the corresponding
+<code>IdentifierInfo</code> structure will have the same contents it
+would have after parsing the headers in the AST file.</p>
 
-<p>Within the PCH file, the identifiers used to name declarations are represented with an integral value. A separate table provides a mapping from this integral value (the identifier ID) to the location within the on-disk
+<p>Within the AST file, the identifiers used to name declarations are represented with an integral value. A separate table provides a mapping from this integral value (the identifier ID) to the location within the on-disk
 hash table where that identifier is stored. This mapping is used when
 deserializing the name of a declaration, the identifier of a token, or
-any other construct in the PCH file that refers to a name.</p>
+any other construct in the AST file that refers to a name.</p>
 
 <h3 id="method-pool">Method Pool Block</h3>
 
@@ -457,7 +458,7 @@
 Objective-C selectors to the set of Objective-C instance and class
 methods that have that particular selector (which is required for
 semantic analysis in Objective-C) and also stores all of the selectors
-used by entities within the precompiled header. The design of the
+used by entities within the AST file. The design of the
 method pool is similar to that of the <a href="#idtable">identifier
 table</a>: the first time a particular selector is formed during the
 compilation of the program, Clang will search in the on-disk hash
@@ -468,25 +469,25 @@
 respectively).</p>
 
 <p>As with identifiers, selectors are represented by numeric values
-within the PCH file. A separate index maps these numeric selector
+within the AST file. A separate index maps these numeric selector
 values to the offset of the selector within the on-disk hash table,
 and will be used when de-serializing an Objective-C method declaration
 (or other Objective-C construct) that refers to the selector.</p>
 
-<h2 id="tendrils">Precompiled Header Integration Points</h2>
+<h2 id="tendrils">AST Reader Integration Points</h2>
 
-<p>The "lazy" deserialization behavior of precompiled headers requires
+<p>The "lazy" deserialization behavior of AST files requires
 their integration into several completely different submodules of
 Clang. For example, lazily deserializing the declarations during name
 lookup requires that the name-lookup routines be able to query the
-precompiled header to find entities within the PCH file.</p>
+AST file to find entities stored there.</p>
 
 <p>For each Clang data structure that requires direct interaction with
-the precompiled header logic, there is an abstract class that provides
-the interface between the two modules. The <code>PCHReader</code>
-class, which handles the loading of a precompiled header, inherits
+the AST reader logic, there is an abstract class that provides
+the interface between the two modules. The <code>ASTReader</code>
+class, which handles the loading of an AST file, inherits
 from all of these abstract classes to provide lazy deserialization of
-Clang's data structures. <code>PCHReader</code> implements the
+Clang's data structures. <code>ASTReader</code> implements the
 following abstract classes:</p>
 
 <dl>
@@ -505,7 +506,7 @@
   <dd>This abstract interface is associated with the
     <code>IdentifierTable</code> class, and is used whenever the
     program source refers to an identifier that has not yet been seen.
-    In this case, the precompiled header implementation searches for
+    In this case, the AST reader searches for
     this identifier within its <a href="#idtable">identifier table</a>
     to load any top-level declarations or macros associated with that
     identifier.</dd>
@@ -513,7 +514,7 @@
   <dt><code>ExternalASTSource</code></dt>
   <dd>This abstract interface is associated with the
     <code>ASTContext</code> class, and is used whenever the abstract
-    syntax tree nodes need to loaded from the precompiled header. It
+    syntax tree nodes need to loaded from the AST file. It
     provides the ability to de-serialize declarations and types
     identified by their numeric values, read the bodies of functions
     when required, and read the declarations stored within a
@@ -526,6 +527,131 @@
     pool</a>.</dd>
 </dl>
 
+<h2 id="chained">Chained precompiled headers</h2>
+
+<p>Chained precompiled headers were initially intended to improve the
+performance of IDE-centric operations such as syntax highlighting and
+code completion while a particular source file is being edited by the
+user. To minimize the amount of reparsing required after a change to
+the file, a form of precompiled header--called a precompiled
+<i>preamble</i>--is automatically generated by parsing all of the
+headers in the source file, up to and including the last
+#include. When only the source file changes (and none of the headers
+it depends on), reparsing of that source file can use the precompiled
+preamble and start parsing after the #includes, so parsing time is
+proportional to the size of the source file (rather than all of its
+includes). However, the compilation of that translation unit
+may already uses a precompiled header: in this case, Clang will create
+the precompiled preamble as a chained precompiled header that refers
+to the original precompiled header. This drastically reduces the time
+needed to serialize the precompiled preamble for use in reparsing.</p>
+
+<p>Chained precompiled headers get their name because each precompiled header
+can depend on one other precompiled header, forming a chain of
+dependencies. A translation unit will then include the precompiled
+header that starts the chain (i.e., nothing depends on it). This
+linearity of dependencies is important for the semantic model of
+chained precompiled headers, because the most-recent precompiled
+header can provide information that overrides the information provided
+by the precompiled headers it depends on, just like a header file
+<code>B.h</code> that includes another header <code>A.h</code> can
+modify the state produced by parsing <code>A.h</code>, e.g., by
+<code>#undef</code>'ing a macro defined in <code>A.h</code>.</p>
+
+<p>There are several ways in which chained precompiled headers
+generalize the AST file model:</p>
+
+<dl>
+  <dt>Numbering of IDs</dt>
+  <dd>Many different kinds of entities--identifiers, declarations,
+  types, etc.---have ID numbers that start at 1 or some other
+  predefined constant and grow upward. Each precompiled header records
+  the maximum ID number it has assigned in each category. Then, when a
+  new precompiled header is generated that depends on (chains to)
+  another precompiled header, it will start counting at the next
+  available ID number. This way, one can determine, given an ID
+  number, which AST file actually contains the entity.</dd>
+
+  <dt>Name lookup</dt>
+  <dd>When writing a chained precompiled header, Clang attempts to
+  write only information that has changed from the precompiled header
+  on which it is based. This changes the lookup algorithm for the
+  various tables, such as the <a href="#idtable">identifier table</a>:
+  the search starts at the most-recent precompiled header. If no entry
+  is found, lookup then proceeds to the identifier table in the
+  precompiled header it depends on, and so one. Once a lookup
+  succeeds, that result is considered definitive, overriding any
+  results from earlier precompiled headers.</dd>
+
+  <dt>Update records</dt>
+  <dd>There are various ways in which a later precompiled header can
+  modify the entities described in an earlier precompiled header. For
+  example, later precompiled headers can add entries into the various
+  name-lookup tables for the translation unit or namespaces, or add
+  new categories to an Objective-C class. Each of these updates is
+  captured in an "update record" that is stored in the chained
+  precompiled header file and will be loaded along with the original
+  entity.</dd>
+</dl>
+
+<h2 id="modules">Modules</h2>
+
+<p>Modules generalize the chained precompiled header model yet
+further, from a linear chain of precompiled headers to an arbitrary
+directed acyclic graph (DAG) of AST files. All of the same techniques
+used to make chained precompiled headers work---ID number, name
+lookup, update records---are shared with modules. However, the DAG
+nature of modules introduce a number of additional complications to
+the model:
+
+<dl>
+  <dt>Numbering of IDs</dt>
+  <dd>The simple, linear numbering scheme used in chained precompiled
+  headers falls apart with the module DAG, because different modules
+  may end up with different numbering schemes for entities they
+  imported from common shared modules. To account for this, each
+  module file provides information about which modules it depends on
+  and which ID numbers it assigned to the entities in those modules,
+  as well as which ID numbers it took for its own new entities. The
+  AST reader then maps these "local" ID numbers into a "global" ID
+  number space for the current translation unit, providing a 1-1
+  mapping between entities (in whatever AST file they inhabit) and
+  global ID numbers. If that translation unit is then serialized into
+  an AST file, this mapping will be stored for use when the AST file
+  is imported.</dd>
+
+  <dt>Declaration merging</dt>
+  <dd>It is possible for a given entity (from the language's
+  perspective) to be declared multiple times in different places. For
+  example, two different headers can have the declaration of
+  <tt>printf</tt> or could forward-declare <tt>struct stat</tt>. If
+  each of those headers is included in a module, and some third party
+  imports both of those modules, there is a potentially serious
+  problem: name lookup for <tt>printf</tt> or <tt>struct stat</tt> will
+  find both declarations, but the AST nodes are unrelated. This would
+  result in a compilation error, due to an ambiguity in name
+  lookup. Therefore, the AST reader performs declaration merging
+  according to the appropriate langauge semantics, ensuring that the
+  two disjoint declarations are merged into a single redeclaration
+  chain (with a common canonical declaration), so that it is as if one
+  of the headers had been included before the other.</dd>
+
+  <dt>Name Visibility</dt>
+  <dd>Modules allow certain names that occur during module creation to
+  be "hidden", so that they are not part of the public interface of
+  the module and are not visible to its clients. The AST reader
+  maintains a "visible" bit on various AST nodes (declarations, macros,
+  etc.) to indicate whether that particular AST node is currently
+  visible; the various name lookup mechanisms in Clang inspect the
+  visible bit to determine whether that entity, which is still in the
+  AST (because other, visible AST nodes may depend on it), can
+  actually be found by name lookup. When a new (sub)module is
+  imported, it may make existing, non-visible, already-deserialized
+  AST nodes visible; it is the responsibility of the AST reader to
+  find and update these AST nodes when it is notified of the import.</dd>
+    
+</dl>
+  
 </div>
 
 </body>





More information about the cfe-commits mailing list