<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Hi all,<div class=""><br class=""></div><div class="">Modules provide a useful place to cache the results of parsing a library’s headers for later, efficient consumption. Clang does this for all of the information it gathers during parsing, including the ASTs, preprocessor state, lookup tables, comments, and so on, with the intent to minimize the amount of deserialization that occurs when a particular module is being used.</div><div class=""><br class=""></div><div class="">I’d like to extend this capability so that tools built on top of Clang can store their own information in the module file on disk. This mechanism would be used when there is information that can be computed at module-build time that would be expensive to recompute in every user of the module. For example, function summaries for the static analyzer, application-specific indexes that would require deserializing the entire module file to recompute. One could compute this information and put it into a separate file or database, but given that the information is naturally tied to modules—and generally needs to be invalidated at the same time the module file itself needs to be rebuilt—it makes more architectural sense to put that information directly in the module file when it is built, while the full AST is still efficiently accessible in memory.</div><div class=""><br class=""></div><div class="">A <i class="">module file extension</i> is a bit of custom logic that can piggy-back data into a module file. Each module file extension is described by a unique block name identifying the extension, as well as other metadata (major/minor version, user information string) about the extension itself. Each module file extension gets to write into its own separate extension block in the resulting module file, separate from the rest of the module file contents and from other extensions. To do that, it provides both a writer (that writes bitstream records into the output file) and a reader (that can read back those bitstream records).</div><div class=""><br class=""></div><div class="">I’ve attached an implementation of module file extensions. It sketches out the interface to a module file extension (see below, or check out ModuleFileExtension.h for the full interface) and implements a module file extension for testing purposes so we can illustrate the round-tripping of data through the module file format, matching of extension blocks written to extension blocks read, and so on. </div><div class=""><br class=""></div><blockquote class="" style="margin: 0px 0px 0px 40px; border: none; padding: 0px;"><div class=""><div class=""><font face="Courier New" class="">/// Metadata for a module file extension.</font></div><div class=""><font face="Courier New" class="">struct ModuleFileExtensionMetadata {</font></div><div class=""><font face="Courier New" class=""> /// The name used to identify this particular extension block within</font></div><div class=""><font face="Courier New" class=""> /// the resulting module file. It should be unique to the particular</font></div><div class=""><font face="Courier New" class=""> /// extension, because this name will be used to match the name of</font></div><div class=""><font face="Courier New" class=""> /// an extension block to the appropriate reader.</font></div><div class=""><font face="Courier New" class=""> std::string BlockName;</font></div><div class=""><font face="Courier New" class=""><br class=""></font></div><div class=""><font face="Courier New" class=""> /// The major version of the extension data.</font></div><div class=""><font face="Courier New" class=""> unsigned MajorVersion;</font></div><div class=""><font face="Courier New" class=""><br class=""></font></div><div class=""><font face="Courier New" class=""> /// The minor version of the extension data.</font></div><div class=""><font face="Courier New" class=""> unsigned MinorVersion;</font></div><div class=""><font face="Courier New" class=""><br class=""></font></div><div class=""><font face="Courier New" class=""> /// A string containing additional user information that will be</font></div><div class=""><font face="Courier New" class=""> /// stored with the metadata.</font></div><div class=""><font face="Courier New" class=""> std::string UserInfo;</font></div><div class=""><font face="Courier New" class="">};</font></div><div class=""><font face="Courier New" class=""><br class=""></font></div><div class=""><font face="Courier New" class="">/// An abstract superclass that describes a custom extension to the</font></div></div><div class=""><div class=""><font face="Courier New" class="">/// module/precompiled header file format.</font></div></div><div class=""><div class=""><font face="Courier New" class="">///</font></div></div><div class=""><div class=""><font face="Courier New" class="">/// A module file extension can introduce additional information into</font></div></div><div class=""><div class=""><font face="Courier New" class="">/// compiled module files (.pcm) and precompiled headers (.pch) via a</font></div></div><div class=""><div class=""><font face="Courier New" class="">/// custom writer that can then be accessed via a custom reader when</font></div></div><div class=""><div class=""><font face="Courier New" class="">/// the module file or precompiled header is loaded.</font></div></div><div class=""><div class=""><font face="Courier New" class="">class ModuleFileExtension : public llvm::RefCountedBase<ModuleFileExtension> {</font></div></div><div class=""><div class=""><font face="Courier New" class="">public:</font></div></div><div class=""><div class=""><font face="Courier New" class=""> virtual ~ModuleFileExtension();</font></div></div><div class=""><div class=""><font face="Courier New" class=""><br class=""></font></div></div><div class=""><div class=""><font face="Courier New" class=""> /// Retrieves the metadata for this module file extension.</font></div></div><div class=""><div class=""><font face="Courier New" class=""> virtual ModuleFileExtensionMetadata getExtensionMetadata() const = 0;</font></div></div><div class=""><div class=""><font face="Courier New" class=""><br class=""></font></div></div><div class=""><div class=""><font face="Courier New" class=""> /// Hash information about the presence of this extension into the</font></div></div><div class=""><div class=""><font face="Courier New" class=""> /// module hash code.</font></div></div><div class=""><div class=""><font face="Courier New" class=""> ///</font></div></div><div class=""><div class=""><font face="Courier New" class=""> /// The module hash code is used to distinguish different variants</font></div></div><div class=""><div class=""><font face="Courier New" class=""> /// of a module that are incompatible. If the presence, absence, or</font></div></div><div class=""><div class=""><font face="Courier New" class=""> /// version of the module file extension should force the creation</font></div></div><div class=""><div class=""><font face="Courier New" class=""> /// of a separate set of module files, override this method to</font></div></div><div class=""><div class=""><font face="Courier New" class=""> /// combine that distinguishing information into the module hash</font></div></div><div class=""><div class=""><font face="Courier New" class=""> /// code.</font></div></div><div class=""><div class=""><font face="Courier New" class=""> ///</font></div></div><div class=""><div class=""><font face="Courier New" class=""> /// The default implementation of this function simply returns the</font></div></div><div class=""><div class=""><font face="Courier New" class=""> /// hash code as given, so the presence/absence of this extension</font></div></div><div class=""><div class=""><font face="Courier New" class=""> /// does not distinguish module files.</font></div></div><div class=""><div class=""><font face="Courier New" class=""> virtual llvm::hash_code hashExtension(llvm::hash_code Code) const;</font></div></div><div class=""><div class=""><font face="Courier New" class=""><br class=""></font></div></div><div class=""><div class=""><font face="Courier New" class=""> /// Create a new module file extension writer, which will be</font></div></div><div class=""><div class=""><font face="Courier New" class=""> /// responsible for writing the extension contents into a particular</font></div></div><div class=""><div class=""><font face="Courier New" class=""> /// module file.</font></div></div><div class=""><div class=""><font face="Courier New" class=""> virtual std::unique_ptr<ModuleFileExtensionWriter></font></div></div><div class=""><div class=""><font face="Courier New" class=""> createExtensionWriter() = 0;</font></div></div><div class=""><div class=""><font face="Courier New" class=""><br class=""></font></div></div><div class=""><div class=""><font face="Courier New" class=""> /// Create a new module file extension reader, given the</font></div></div><div class=""><div class=""><font face="Courier New" class=""> /// metadata read from the block and the cursor into the extension</font></div></div><div class=""><div class=""><font face="Courier New" class=""> /// block.</font></div></div><div class=""><div class=""><font face="Courier New" class=""> ///</font></div></div><div class=""><div class=""><font face="Courier New" class=""> /// May return null to indicate that an extension block with the</font></div></div><div class=""><div class=""><font face="Courier New" class=""> /// given metadata cannot be read.</font></div></div><div class=""><div class=""><font face="Courier New" class=""> virtual std::unique_ptr<ModuleFileExtensionReader></font></div></div><div class=""><div class=""><font face="Courier New" class=""> createExtensionReader(const ModuleFileExtensionMetadata &Metadata,</font></div></div><div class=""><div class=""><font face="Courier New" class=""> ASTReader &Reader, serialization::ModuleFile &Mod,</font></div></div><div class=""><div class=""><font face="Courier New" class=""> const llvm::BitstreamCursor &Stream) = 0;</font></div></div><div class=""><div class=""><font face="Courier New" class="">};</font></div></div><div class=""><font face="Courier New" class=""><br class=""></font></div><div class=""><font face="Courier New" class=""><div class="">/// Abstract base class that writes a module file extension block into</div><div class="">/// a module file.</div><div class="">class ModuleFileExtensionWriter {</div><div class=""> ModuleFileExtension *Extension;</div><div class=""><br class=""></div><div class="">protected:</div><div class=""> ModuleFileExtensionWriter(ModuleFileExtension *Extension)</div><div class=""> : Extension(Extension) { }</div><div class=""><br class=""></div><div class="">public:</div><div class=""> virtual ~ModuleFileExtensionWriter();</div><div class=""><br class=""></div><div class=""> /// Retrieve the module file extension with which this writer is</div><div class=""> /// associated.</div><div class=""> ModuleFileExtension *getExtension() const { return Extension; }</div><div class=""><br class=""></div><div class=""> /// Write the contents of the extension block into the given bitstream.</div><div class=""> ///</div><div class=""> /// Responsible for writing the contents of the extension into the</div><div class=""> /// given stream. All of the contents should be written into custom</div><div class=""> /// records with IDs >= FIRST_EXTENSION_RECORD_ID.</div><div class=""> virtual void writeExtensionContents(llvm::BitstreamWriter &Stream) = 0;</div><div class="">};</div><div class=""><br class=""></div><div class="">/// Abstract base class that reads a module file extension block from</div><div class="">/// a module file.</div><div class="">///</div><div class="">/// Subclasses </div><div class="">class ModuleFileExtensionReader {</div><div class=""> ModuleFileExtension *Extension;</div><div class=""><br class=""></div><div class="">protected:</div><div class=""> ModuleFileExtensionReader(ModuleFileExtension *Extension)</div><div class=""> : Extension(Extension) { }</div><div class=""><br class=""></div><div class="">public:</div><div class=""> /// Retrieve the module file extension with which this reader is</div><div class=""> /// associated.</div><div class=""> ModuleFileExtension *getExtension() const { return Extension; }</div><div class=""><br class=""></div><div class=""> virtual ~ModuleFileExtensionReader();</div><div class="">};</div></font></div></blockquote><div class=""><br class=""></div><div class="">I suspect that the Reader and Writer interfaces will grow somewhat as we get more clients, but this is a start.</div><div class=""><br class=""></div><div class="">Thoughts?</div><div class=""><br class=""></div><div class=""><span class="Apple-tab-span" style="white-space: pre;"> </span>- Doug</div><div class=""><br class=""></div><div class=""></div></body></html>