[cfe-dev] C++20 Modules, PCM files encoding implementation

Tue Nov 17 04:50:32 PST 2020

Dear LLVM team,

I am new.  Please forgive me for a bit, if I bump into any established
protocol.

I was referred to this mailing list to infer about the direction of the
C++20 Modules, as we are considering an evolution of our codebase to the
feature.

First let me reiterate some appreciation for what your team is doing for
the world of computer science with the outstanding and pioneering work on
clang.  We have successfully switched our codebase from MSVC to clang for
the superior code generation and the most modern features of C++.  We have
on average, doubled our compilation speed, tripled our execution speed, and
halved the binary size. We have converted to C++20 Concepts from old SFINAE
hacks, use direct builtins for modern instructions, and continue to marvel
at the excellence of SIMD vectorization.

Our work is focused on exploring algorithmic frontiers in film and pro
audio production.  Artistic quality, precision and execution speed are
paramount.  This type of  algorithm research and development typically
demands fast iteration of the implementation code, often a few formulaic
pages of DSP, rapidly changing to meet the speed and quality needs of the
production team.  The interface to consuming that code changes much less
often.

Our question is about the current encoding of the pcm files generated from
the module cppm files.  We envision accelerating and simplifying
development by converging most h files and cpp files into single module
files.  Currently, clang can compile a cppm file to a pcm file, to be
consumed by the module importing code and the code editor enhancement
clangd.

The implementation code is inside the module, inplace (not to be confused
with the inline keyword for function inlining).  The cppm file can be
independently compiled to an object file and normally linked to produce
function calls from consumers.

The current naive build systems can use the pcm file as a dependency, when
the interface and layout of the classes change inside the module, to
trigger efficient recompilations of the consuming code.

However, we have observed that the pcm file is growing in size as the
inplace implementation code is growing in size.  We envisioned the pcm
would only extract the class interface and the memory layout, but that does
not seem to be the case.  Perhaps, it would take more LLVM effort to
extract and isolate that from the AST tree.

This of course has the unfortunate side effect of triggering redundant
rebuilds of large portions of the codebase, making the iteration times
unacceptable versus older conventions.  The older conventions of splitting
.h and .cpp files involve repeating yourself, wasting developer focus on
simultaneously editing and managing of 2 files at once, and often resorting
to messy pimpl techniques that have to heap allocate a backend and manage 2
references throughout the formulas, etc.  Considering these overheads,
there are 100s and an ever growing number of small and large plugins
(modular effects) that can benefit from convergence to modules as single
and succinct files focusing on clean formulation.

Are there any future plans at LLVM, the pcm files may encode the interface
only?  Or are there any tools and functions you can recommend to extract
the module interface to signal the build system more efficiently?

Thank you for your time.

Sincerely,
Büke Beyond
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20201117/6247d1e0/attachment.html>