[cfe-dev] [Modules TS] Have the file formats been decided?

Richard Smith via cfe-dev cfe-dev at lists.llvm.org
Tue Jan 17 11:30:14 PST 2017


On 16 January 2017 at 13:20, Hamza Sood via cfe-dev <cfe-dev at lists.llvm.org>
wrote:

> I’ve been looking into Clang’s implementation of the C++ Modules TS to see
> if there’s anything I can do to help.
>
> From what I understand, a cppm file is essentially treated as one big
> header file (with handling for a few extra keywords) which is preprocessed
> and dumped to disk as a pcm file containing a binary representation of the
> AST. Consumers of the module will end up importing this pcm file as a
> precompiled header. (If that summary is incorrect then you can stop reading
> here...)
>
> There are a few problems I ran into with this, which I think are because
> of this format:
>   - Consumers of the module will import the entire implementation of all
> of the functions in the cppm, which will lead to a lot of duplicated code
> between object files (and greatly increased compile times).
>

Clang's on-disk AST format is read lazily, so the amount of data in the AST
file is not as important as the amount of that data that is actually used
by a particular compilation. You'll only get duplicated code in AST files
if you have duplicated code in module interfaces. The exception is if
multiple modules instantiate the same template with the same arguments, and
they do not depend on each other.

Right now, we will get duplicated code in object files for functions
defined inside the module interface (at least, for those functions that are
used by the current translation unit). That's simply because the
implementation of the Modules TS is incomplete; we are going to add the
facility to generate code for a module interface at some point, and when we
do we will disable the emission of functions defined inside the interface
when compiling any other translation unit. (As an exception, we may still
emit those function definitions when building with optimizations enabled in
order to support inlining, at least when LTO is disabled.)

  - There’s no way to hide declarations that aren’t exported or that are
> declared as part of the global module.
>

Can you be more specific about what kind of hiding you want?

We provide a mechanism to prevent these declarations from being visible to
downstream code (Clang doesn't fully support the Modules TS export
semantics yet, but we have long supported a __module_private keyword for
this).

We could avoid emitting some such declarations to the AST file, but note
this isn't as simple as just not emitting definitions into the precompiled
form: an exported template can, for instance depend on the definition of a
non-exported template or constexpr function, so at least some of the
non-exported definitions within the module must be available. Once we have
separate code generation for module interfaces, we can consider supporting
this as an optimization.


>   - Library developers will have the ship the entire AST for their project
> if they want users to be able to import it using Modules.
>

Clang's module files are explicitly not a distribution format. You are
expected to ship your module interface files, not a precompiled form of
them.


>    - Disk usage for large projects with lots of code will be fairly high
> (not as big of a problem as the others, but still worth a mention).
>
> Is this format decided on? Or is it just an initial test? If it's not yet
> concrete, then I'd like to propose a slightly different implementation that
> could potentially solve these problems. While parsing a cppm file, we could
> construct two ASTs. One containing the entire file as before, and the other
> consisting of just exported declarations (without their implementations if
> they aren't inline or templated). The former AST could be used to generate
> an object file as usual, while the latter could be dumped to disk as a
> separate interface file (with some kind of special extension). The
> interface file would essentially serve as a binary "header", containing
> only what's needed by consumers of the module.
>

Rather than producing two ASTs, it would be preferable to simply export
less of the AST into the pcm file. As noted above, this optimization is not
yet implemented (along with some of the semantics of the Modules TS).


> Has anyone got any thoughts on this?


Even with modules, large codebases will still want to maintain an interface
/ implementation separation discipline, in order to avoid every change to a
low-level library's implementation triggering unnecessary recompilation of
dependent code. (Keep in mind that a change that affects line numbers in a
low-level library could affect the debug information generated for any
transitive dependency, so we can't necessarily bail out of the compilation
if the abstract interface of the module is unchanged.)

This somewhat reduces the impact of your concerns above, but they are still
real and important considerations.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170117/1e9d807b/attachment.html>


More information about the cfe-dev mailing list