<div dir="ltr">I'd be curious to learn more about why you need to have it in one file. Invalidation is something build systems usually take care of, and for those it seems mostly the same whether they invalidate one or two files (they already need to invalidate a multitude of files, like .o files, module files, linked libraries and binaries, and generated code).<div>I could see an argument for deployment, but for deployment of libraries you'll need the module and its headers anyway.</div><div>Given that we already have a way to embed sources into the modules, I can see that you might be able to build full deployable bundles, where a module file is the only thing you need, and it includes the headers, potentially .o files to link in, and other information (like the one you cite).</div><div>So, all that said, I'm mainly curious whether that's what you're aiming for :)</div></div><br><div class="gmail_quote"><div dir="ltr">On Tue, Oct 27, 2015 at 5:17 AM Douglas Gregor via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word">Hi all,<div><br></div><div>Modules provide a useful place to cache the results of parsing a library’s headers for later, efficient consumption. Clang does this for all of the information it gathers during parsing, including the ASTs, preprocessor state, lookup tables, comments, and so on, with the intent to minimize the amount of deserialization that occurs when a particular module is being used.</div><div><br></div><div>I’d like to extend this capability so that tools built on top of Clang can store their own information in the module file on disk. This mechanism would be used when there is information that can be computed at module-build time that would be expensive to recompute in every user of the module. For example, function summaries for the static analyzer, application-specific indexes that would require deserializing the entire module file to recompute. One could compute this information and put it into a separate file or database, but given that the information is naturally tied to modules—and generally needs to be invalidated at the same time the module file itself needs to be rebuilt—it makes more architectural sense to put that information directly in the module file when it is built, while the full AST is still efficiently accessible in memory.</div><div><br></div><div>A <i>module file extension</i> is a bit of custom logic that can piggy-back data into a module file. Each module file extension is described by a unique block name identifying the extension, as well as other metadata (major/minor version, user information string) about the extension itself. Each module file extension gets to write into its own separate extension block in the resulting module file, separate from the rest of the module file contents and from other extensions. To do that, it provides both a writer (that writes bitstream records into the output file) and a reader (that can read back those bitstream records).</div><div><br></div><div>I’ve attached an implementation of module file extensions. It sketches out the interface to a module file extension (see below, or check out ModuleFileExtension.h for the full interface) and implements a module file extension for testing purposes so we can illustrate the round-tripping of data through the module file format, matching of extension blocks written to extension blocks read, and so on. </div><div><br></div><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div><div><font face="Courier New">/// Metadata for a module file extension.</font></div><div><font face="Courier New">struct ModuleFileExtensionMetadata {</font></div><div><font face="Courier New"> /// The name used to identify this particular extension block within</font></div><div><font face="Courier New"> /// the resulting module file. It should be unique to the particular</font></div><div><font face="Courier New"> /// extension, because this name will be used to match the name of</font></div><div><font face="Courier New"> /// an extension block to the appropriate reader.</font></div><div><font face="Courier New"> std::string BlockName;</font></div><div><font face="Courier New"><br></font></div><div><font face="Courier New"> /// The major version of the extension data.</font></div><div><font face="Courier New"> unsigned MajorVersion;</font></div><div><font face="Courier New"><br></font></div><div><font face="Courier New"> /// The minor version of the extension data.</font></div><div><font face="Courier New"> unsigned MinorVersion;</font></div><div><font face="Courier New"><br></font></div><div><font face="Courier New"> /// A string containing additional user information that will be</font></div><div><font face="Courier New"> /// stored with the metadata.</font></div><div><font face="Courier New"> std::string UserInfo;</font></div><div><font face="Courier New">};</font></div><div><font face="Courier New"><br></font></div><div><font face="Courier New">/// An abstract superclass that describes a custom extension to the</font></div></div><div><div><font face="Courier New">/// module/precompiled header file format.</font></div></div><div><div><font face="Courier New">///</font></div></div><div><div><font face="Courier New">/// A module file extension can introduce additional information into</font></div></div><div><div><font face="Courier New">/// compiled module files (.pcm) and precompiled headers (.pch) via a</font></div></div><div><div><font face="Courier New">/// custom writer that can then be accessed via a custom reader when</font></div></div><div><div><font face="Courier New">/// the module file or precompiled header is loaded.</font></div></div><div><div><font face="Courier New">class ModuleFileExtension : public llvm::RefCountedBase<ModuleFileExtension> {</font></div></div><div><div><font face="Courier New">public:</font></div></div><div><div><font face="Courier New"> virtual ~ModuleFileExtension();</font></div></div><div><div><font face="Courier New"><br></font></div></div><div><div><font face="Courier New"> /// Retrieves the metadata for this module file extension.</font></div></div><div><div><font face="Courier New"> virtual ModuleFileExtensionMetadata getExtensionMetadata() const = 0;</font></div></div><div><div><font face="Courier New"><br></font></div></div><div><div><font face="Courier New"> /// Hash information about the presence of this extension into the</font></div></div><div><div><font face="Courier New"> /// module hash code.</font></div></div><div><div><font face="Courier New"> ///</font></div></div><div><div><font face="Courier New"> /// The module hash code is used to distinguish different variants</font></div></div><div><div><font face="Courier New"> /// of a module that are incompatible. If the presence, absence, or</font></div></div><div><div><font face="Courier New"> /// version of the module file extension should force the creation</font></div></div><div><div><font face="Courier New"> /// of a separate set of module files, override this method to</font></div></div><div><div><font face="Courier New"> /// combine that distinguishing information into the module hash</font></div></div><div><div><font face="Courier New"> /// code.</font></div></div><div><div><font face="Courier New"> ///</font></div></div><div><div><font face="Courier New"> /// The default implementation of this function simply returns the</font></div></div><div><div><font face="Courier New"> /// hash code as given, so the presence/absence of this extension</font></div></div><div><div><font face="Courier New"> /// does not distinguish module files.</font></div></div><div><div><font face="Courier New"> virtual llvm::hash_code hashExtension(llvm::hash_code Code) const;</font></div></div><div><div><font face="Courier New"><br></font></div></div><div><div><font face="Courier New"> /// Create a new module file extension writer, which will be</font></div></div><div><div><font face="Courier New"> /// responsible for writing the extension contents into a particular</font></div></div><div><div><font face="Courier New"> /// module file.</font></div></div><div><div><font face="Courier New"> virtual std::unique_ptr<ModuleFileExtensionWriter></font></div></div><div><div><font face="Courier New"> createExtensionWriter() = 0;</font></div></div><div><div><font face="Courier New"><br></font></div></div><div><div><font face="Courier New"> /// Create a new module file extension reader, given the</font></div></div><div><div><font face="Courier New"> /// metadata read from the block and the cursor into the extension</font></div></div><div><div><font face="Courier New"> /// block.</font></div></div><div><div><font face="Courier New"> ///</font></div></div><div><div><font face="Courier New"> /// May return null to indicate that an extension block with the</font></div></div><div><div><font face="Courier New"> /// given metadata cannot be read.</font></div></div><div><div><font face="Courier New"> virtual std::unique_ptr<ModuleFileExtensionReader></font></div></div><div><div><font face="Courier New"> createExtensionReader(const ModuleFileExtensionMetadata &Metadata,</font></div></div><div><div><font face="Courier New"> ASTReader &Reader, serialization::ModuleFile &Mod,</font></div></div><div><div><font face="Courier New"> const llvm::BitstreamCursor &Stream) = 0;</font></div></div><div><div><font face="Courier New">};</font></div></div><div><font face="Courier New"><br></font></div><div><font face="Courier New"><div>/// Abstract base class that writes a module file extension block into</div><div>/// a module file.</div><div>class ModuleFileExtensionWriter {</div><div> ModuleFileExtension *Extension;</div><div><br></div><div>protected:</div><div> ModuleFileExtensionWriter(ModuleFileExtension *Extension)</div><div> : Extension(Extension) { }</div><div><br></div><div>public:</div><div> virtual ~ModuleFileExtensionWriter();</div><div><br></div><div> /// Retrieve the module file extension with which this writer is</div><div> /// associated.</div><div> ModuleFileExtension *getExtension() const { return Extension; }</div><div><br></div><div> /// Write the contents of the extension block into the given bitstream.</div><div> ///</div><div> /// Responsible for writing the contents of the extension into the</div><div> /// given stream. All of the contents should be written into custom</div><div> /// records with IDs >= FIRST_EXTENSION_RECORD_ID.</div><div> virtual void writeExtensionContents(llvm::BitstreamWriter &Stream) = 0;</div><div>};</div><div><br></div><div>/// Abstract base class that reads a module file extension block from</div><div>/// a module file.</div><div>///</div><div>/// Subclasses </div><div>class ModuleFileExtensionReader {</div><div> ModuleFileExtension *Extension;</div><div><br></div><div>protected:</div><div> ModuleFileExtensionReader(ModuleFileExtension *Extension)</div><div> : Extension(Extension) { }</div><div><br></div><div>public:</div><div> /// Retrieve the module file extension with which this reader is</div><div> /// associated.</div><div> ModuleFileExtension *getExtension() const { return Extension; }</div><div><br></div><div> virtual ~ModuleFileExtensionReader();</div><div>};</div></font></div></blockquote><div><br></div><div>I suspect that the Reader and Writer interfaces will grow somewhat as we get more clients, but this is a start.</div><div><br></div><div>Thoughts?</div><div><br></div><div><span style="white-space:pre-wrap"> </span>- Doug</div><div><br></div><div></div></div><div style="word-wrap:break-word"></div>_______________________________________________<br>
cfe-dev mailing list<br>
<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>
</blockquote></div>