[cfe-dev] C++20 Modules, PCM files encoding implementation

Richard Smith via cfe-dev cfe-dev at lists.llvm.org
Tue Nov 17 14:50:12 PST 2020


+Iain Sandoe, who has been looking at C++ Modules implementation issues in
Clang.

I think it would make sense to be able to emit a "cut-down" pcm file that
omits information that an importer of the module never needs (such as
definitions of non-inline functions), in order to keep the file sizes
smaller. However, that alone won't be enough to avoid rebuilds when the
.cppm files change -- we also encode source location information in the pcm
file that would be invalidated whenever implementation details change -- or
at least whenever they change size. In principle, there are techniques we
could use here to avoid rebuilds when only those locations change, such as
splitting the location information out into a separate file that is not
listed as a dependency of downstream compilations (eg, according to -M),
but that would need investigation.

Another promising idea that has not been investigated is the possibility of
generating two different pcm's for each module: one containing only
cut-down 'forward declaration'-level information (no class definitions, no
inline function bodies, and so on), and one containing full information.
The idea would be that we initially load only the cut-down version, and
pull in the full information (and include the additional file as a
dependency according to -M) only if the dependent compilation needs that
information. Then we can avoid rebuilds if (for example) a class definition
in a module interface changes but the consumers of that module interface
didn't actually use the class definition.

But I don't think anyone has done any work to implement these approaches.

On Tue, 17 Nov 2020 at 13:46, David Blaikie <dblaikie at gmail.com> wrote:

> I don't think anyone's actively looking at this right now - perhaps
> partly because there's still significant benefit to separating the
> interface and implementation, even when using modules (no extraneous
> rebuilds when you change the implementation - even if that rebuild
> only rebuilds the interface and then you have a hash (rather than
> timestamp) based build system that finds the interface to be identical
> and so nothing else downstream is touched). Also at least with Clang's
> model, I think the idea is to build the object file from the pcm
> rather than from the cppm file. Though the possibility of having two
> output files has some potential benefits, to be sure - I /think/ maybe
> MSVC is doing something more like that two file model, but I don't
> know for sure.
>
> On Tue, Nov 17, 2020 at 4:51 AM Büke Beyond via cfe-dev
> <cfe-dev at lists.llvm.org> wrote:
> >
> > Dear LLVM team,
> >
> > I am new.  Please forgive me for a bit, if I bump into any established
> protocol.
> >
> > I was referred to this mailing list to infer about the direction of the
> C++20 Modules, as we are considering an evolution of our codebase to the
> feature.
> >
> > First let me reiterate some appreciation for what your team is doing for
> the world of computer science with the outstanding and pioneering work on
> clang.  We have successfully switched our codebase from MSVC to clang for
> the superior code generation and the most modern features of C++.  We have
> on average, doubled our compilation speed, tripled our execution speed, and
> halved the binary size. We have converted to C++20 Concepts from old SFINAE
> hacks, use direct builtins for modern instructions, and continue to marvel
> at the excellence of SIMD vectorization.
> >
> > Our work is focused on exploring algorithmic frontiers in film and pro
> audio production.  Artistic quality, precision and execution speed are
> paramount.  This type of  algorithm research and development typically
> demands fast iteration of the implementation code, often a few formulaic
> pages of DSP, rapidly changing to meet the speed and quality needs of the
> production team.  The interface to consuming that code changes much less
> often.
> >
> > Our question is about the current encoding of the pcm files generated
> from the module cppm files.  We envision accelerating and simplifying
> development by converging most h files and cpp files into single module
> files.  Currently, clang can compile a cppm file to a pcm file, to be
> consumed by the module importing code and the code editor enhancement
> clangd.
> >
> > The implementation code is inside the module, inplace (not to be
> confused with the inline keyword for function inlining).  The cppm file can
> be independently compiled to an object file and normally linked to produce
> function calls from consumers.
> >
> > The current naive build systems can use the pcm file as a dependency,
> when the interface and layout of the classes change inside the module, to
> trigger efficient recompilations of the consuming code.
> >
> > However, we have observed that the pcm file is growing in size as the
> inplace implementation code is growing in size.  We envisioned the pcm
> would only extract the class interface and the memory layout, but that does
> not seem to be the case.  Perhaps, it would take more LLVM effort to
> extract and isolate that from the AST tree.
> >
> > This of course has the unfortunate side effect of triggering redundant
> rebuilds of large portions of the codebase, making the iteration times
> unacceptable versus older conventions.  The older conventions of splitting
> .h and .cpp files involve repeating yourself, wasting developer focus on
> simultaneously editing and managing of 2 files at once, and often resorting
> to messy pimpl techniques that have to heap allocate a backend and manage 2
> references throughout the formulas, etc.  Considering these overheads,
> there are 100s and an ever growing number of small and large plugins
> (modular effects) that can benefit from convergence to modules as single
> and succinct files focusing on clean formulation.
> >
> > Are there any future plans at LLVM, the pcm files may encode the
> interface only?  Or are there any tools and functions you can recommend to
> extract the module interface to signal the build system more efficiently?
> >
> > Thank you for your time.
> >
> > Sincerely,
> > Büke Beyond
> > _______________________________________________
> > cfe-dev mailing list
> > cfe-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20201117/25588111/attachment.html>


More information about the cfe-dev mailing list