[cfe-dev] RFC: Module file extensions

Douglas Gregor via cfe-dev cfe-dev at lists.llvm.org
Mon Oct 26 21:17:44 PDT 2015


Hi all,

Modules provide a useful place to cache the results of parsing a library’s headers for later, efficient consumption. Clang does this for all of the information it gathers during parsing, including the ASTs, preprocessor state, lookup tables, comments, and so on, with the intent to minimize the amount of deserialization that occurs when a particular module is being used.

I’d like to extend this capability so that tools built on top of Clang can store their own information in the module file on disk. This mechanism would be used when there is information that can be computed at module-build time that would be expensive to recompute in every user of the module. For example, function summaries for the static analyzer, application-specific indexes that would require deserializing the entire module file to recompute. One could compute this information and put it into a separate file or database, but given that the information is naturally tied to modules—and generally needs to be invalidated at the same time the module file itself needs to be rebuilt—it makes more architectural sense to put that information directly in the module file when it is built, while the full AST is still efficiently accessible in memory.

A module file extension is a bit of custom logic that can piggy-back data into a module file. Each module file extension is described by a unique block name identifying the extension, as well as other metadata (major/minor version, user information string) about the extension itself. Each module file extension gets to write into its own separate extension block in the resulting module file, separate from the rest of the module file contents and from other extensions. To do that, it provides both a writer (that writes bitstream records into the output file) and a reader (that can read back those bitstream records).

I’ve attached an implementation of module file extensions. It sketches out the interface to a module file extension (see below, or check out ModuleFileExtension.h for the full interface) and implements a module file extension for testing purposes so we can illustrate the round-tripping of data through the module file format, matching of extension blocks written to extension blocks read, and so on. 

/// Metadata for a module file extension.
struct ModuleFileExtensionMetadata {
  /// The name used to identify this particular extension block within
  /// the resulting module file. It should be unique to the particular
  /// extension, because this name will be used to match the name of
  /// an extension block to the appropriate reader.
  std::string BlockName;

  /// The major version of the extension data.
  unsigned MajorVersion;

  /// The minor version of the extension data.
  unsigned MinorVersion;

  /// A string containing additional user information that will be
  /// stored with the metadata.
  std::string UserInfo;
};

/// An abstract superclass that describes a custom extension to the
/// module/precompiled header file format.
///
/// A module file extension can introduce additional information into
/// compiled module files (.pcm) and precompiled headers (.pch) via a
/// custom writer that can then be accessed via a custom reader when
/// the module file or precompiled header is loaded.
class ModuleFileExtension : public llvm::RefCountedBase<ModuleFileExtension> {
public:
  virtual ~ModuleFileExtension();

  /// Retrieves the metadata for this module file extension.
  virtual ModuleFileExtensionMetadata getExtensionMetadata() const = 0;

  /// Hash information about the presence of this extension into the
  /// module hash code.
  ///
  /// The module hash code is used to distinguish different variants
  /// of a module that are incompatible. If the presence, absence, or
  /// version of the module file extension should force the creation
  /// of a separate set of module files, override this method to
  /// combine that distinguishing information into the module hash
  /// code.
  ///
  /// The default implementation of this function simply returns the
  /// hash code as given, so the presence/absence of this extension
  /// does not distinguish module files.
  virtual llvm::hash_code hashExtension(llvm::hash_code Code) const;

  /// Create a new module file extension writer, which will be
  /// responsible for writing the extension contents into a particular
  /// module file.
  virtual std::unique_ptr<ModuleFileExtensionWriter>
  createExtensionWriter() = 0;

  /// Create a new module file extension reader, given the
  /// metadata read from the block and the cursor into the extension
  /// block.
  ///
  /// May return null to indicate that an extension block with the
  /// given metadata cannot be read.
  virtual std::unique_ptr<ModuleFileExtensionReader>
  createExtensionReader(const ModuleFileExtensionMetadata &Metadata,
                        ASTReader &Reader, serialization::ModuleFile &Mod,
                        const llvm::BitstreamCursor &Stream) = 0;
};

/// Abstract base class that writes a module file extension block into
/// a module file.
class ModuleFileExtensionWriter {
  ModuleFileExtension *Extension;

protected:
  ModuleFileExtensionWriter(ModuleFileExtension *Extension)
    : Extension(Extension) { }

public:
  virtual ~ModuleFileExtensionWriter();

  /// Retrieve the module file extension with which this writer is
  /// associated.
  ModuleFileExtension *getExtension() const { return Extension; }

  /// Write the contents of the extension block into the given bitstream.
  ///
  /// Responsible for writing the contents of the extension into the
  /// given stream. All of the contents should be written into custom
  /// records with IDs >= FIRST_EXTENSION_RECORD_ID.
  virtual void writeExtensionContents(llvm::BitstreamWriter &Stream) = 0;
};

/// Abstract base class that reads a module file extension block from
/// a module file.
///
/// Subclasses 
class ModuleFileExtensionReader {
  ModuleFileExtension *Extension;

protected:
  ModuleFileExtensionReader(ModuleFileExtension *Extension)
    : Extension(Extension) { }

public:
  /// Retrieve the module file extension with which this reader is
  /// associated.
  ModuleFileExtension *getExtension() const { return Extension; }

  virtual ~ModuleFileExtensionReader();
};

I suspect that the Reader and Writer interfaces will grow somewhat as we get more clients, but this is a start.

Thoughts?

	- Doug

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20151026/28097ed8/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Introduce-module-file-extensions-to-piggy-back-data-.patch
Type: application/octet-stream
Size: 64055 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20151026/28097ed8/attachment.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20151026/28097ed8/attachment-0001.html>


More information about the cfe-dev mailing list