[cfe-dev] RFC: Module file extensions

Manuel Klimek via cfe-dev cfe-dev at lists.llvm.org
Mon Oct 26 23:58:24 PDT 2015


I'd be curious to learn more about why you need to have it in one file.
Invalidation is something build systems usually take care of, and for those
it seems mostly the same whether they invalidate one or two files (they
already need to invalidate a multitude of files, like .o files, module
files, linked libraries and binaries, and generated code).
I could see an argument for deployment, but for deployment of libraries
you'll need the module and its headers anyway.
Given that we already have a way to embed sources into the modules, I can
see that you might be able to build full deployable bundles, where a module
file is the only thing you need, and it includes the headers, potentially
.o files to link in, and other information (like the one you cite).
So, all that said, I'm mainly curious whether that's what you're aiming for
:)

On Tue, Oct 27, 2015 at 5:17 AM Douglas Gregor via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> Hi all,
>
> Modules provide a useful place to cache the results of parsing a library’s
> headers for later, efficient consumption. Clang does this for all of the
> information it gathers during parsing, including the ASTs, preprocessor
> state, lookup tables, comments, and so on, with the intent to minimize the
> amount of deserialization that occurs when a particular module is being
> used.
>
> I’d like to extend this capability so that tools built on top of Clang can
> store their own information in the module file on disk. This mechanism
> would be used when there is information that can be computed at
> module-build time that would be expensive to recompute in every user of the
> module. For example, function summaries for the static analyzer,
> application-specific indexes that would require deserializing the entire
> module file to recompute. One could compute this information and put it
> into a separate file or database, but given that the information is
> naturally tied to modules—and generally needs to be invalidated at the same
> time the module file itself needs to be rebuilt—it makes more architectural
> sense to put that information directly in the module file when it is built,
> while the full AST is still efficiently accessible in memory.
>
> A *module file extension* is a bit of custom logic that can piggy-back
> data into a module file. Each module file extension is described by a
> unique block name identifying the extension, as well as other metadata
> (major/minor version, user information string) about the extension itself.
> Each module file extension gets to write into its own separate extension
> block in the resulting module file, separate from the rest of the module
> file contents and from other extensions. To do that, it provides both a
> writer (that writes bitstream records into the output file) and a reader
> (that can read back those bitstream records).
>
> I’ve attached an implementation of module file extensions. It sketches out
> the interface to a module file extension (see below, or check out
> ModuleFileExtension.h for the full interface) and implements a module file
> extension for testing purposes so we can illustrate the round-tripping of
> data through the module file format, matching of extension blocks written
> to extension blocks read, and so on.
>
> /// Metadata for a module file extension.
> struct ModuleFileExtensionMetadata {
>   /// The name used to identify this particular extension block within
>   /// the resulting module file. It should be unique to the particular
>   /// extension, because this name will be used to match the name of
>   /// an extension block to the appropriate reader.
>   std::string BlockName;
>
>   /// The major version of the extension data.
>   unsigned MajorVersion;
>
>   /// The minor version of the extension data.
>   unsigned MinorVersion;
>
>   /// A string containing additional user information that will be
>   /// stored with the metadata.
>   std::string UserInfo;
> };
>
> /// An abstract superclass that describes a custom extension to the
> /// module/precompiled header file format.
> ///
> /// A module file extension can introduce additional information into
> /// compiled module files (.pcm) and precompiled headers (.pch) via a
> /// custom writer that can then be accessed via a custom reader when
> /// the module file or precompiled header is loaded.
> class ModuleFileExtension : public
> llvm::RefCountedBase<ModuleFileExtension> {
> public:
>   virtual ~ModuleFileExtension();
>
>   /// Retrieves the metadata for this module file extension.
>   virtual ModuleFileExtensionMetadata getExtensionMetadata() const = 0;
>
>   /// Hash information about the presence of this extension into the
>   /// module hash code.
>   ///
>   /// The module hash code is used to distinguish different variants
>   /// of a module that are incompatible. If the presence, absence, or
>   /// version of the module file extension should force the creation
>   /// of a separate set of module files, override this method to
>   /// combine that distinguishing information into the module hash
>   /// code.
>   ///
>   /// The default implementation of this function simply returns the
>   /// hash code as given, so the presence/absence of this extension
>   /// does not distinguish module files.
>   virtual llvm::hash_code hashExtension(llvm::hash_code Code) const;
>
>   /// Create a new module file extension writer, which will be
>   /// responsible for writing the extension contents into a particular
>   /// module file.
>   virtual std::unique_ptr<ModuleFileExtensionWriter>
>   createExtensionWriter() = 0;
>
>   /// Create a new module file extension reader, given the
>   /// metadata read from the block and the cursor into the extension
>   /// block.
>   ///
>   /// May return null to indicate that an extension block with the
>   /// given metadata cannot be read.
>   virtual std::unique_ptr<ModuleFileExtensionReader>
>   createExtensionReader(const ModuleFileExtensionMetadata &Metadata,
>                         ASTReader &Reader, serialization::ModuleFile &Mod,
>                         const llvm::BitstreamCursor &Stream) = 0;
> };
>
> /// Abstract base class that writes a module file extension block into
> /// a module file.
> class ModuleFileExtensionWriter {
>   ModuleFileExtension *Extension;
>
> protected:
>   ModuleFileExtensionWriter(ModuleFileExtension *Extension)
>     : Extension(Extension) { }
>
> public:
>   virtual ~ModuleFileExtensionWriter();
>
>   /// Retrieve the module file extension with which this writer is
>   /// associated.
>   ModuleFileExtension *getExtension() const { return Extension; }
>
>   /// Write the contents of the extension block into the given bitstream.
>   ///
>   /// Responsible for writing the contents of the extension into the
>   /// given stream. All of the contents should be written into custom
>   /// records with IDs >= FIRST_EXTENSION_RECORD_ID.
>   virtual void writeExtensionContents(llvm::BitstreamWriter &Stream) = 0;
> };
>
> /// Abstract base class that reads a module file extension block from
> /// a module file.
> ///
> /// Subclasses
> class ModuleFileExtensionReader {
>   ModuleFileExtension *Extension;
>
> protected:
>   ModuleFileExtensionReader(ModuleFileExtension *Extension)
>     : Extension(Extension) { }
>
> public:
>   /// Retrieve the module file extension with which this reader is
>   /// associated.
>   ModuleFileExtension *getExtension() const { return Extension; }
>
>   virtual ~ModuleFileExtensionReader();
> };
>
>
> I suspect that the Reader and Writer interfaces will grow somewhat as we
> get more clients, but this is a start.
>
> Thoughts?
>
> - Doug
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20151027/e79e795b/attachment.html>


More information about the cfe-dev mailing list