[cfe-dev] C++20 module protocol

Nathan Sidwell via cfe-dev cfe-dev at lists.llvm.org
Mon May 18 08:36:42 PDT 2020

these files are the GCC implementation of the p1184 (wg21.link/p1184) 
protocol.  Although part of GCC, they are entirely authored by me, so I 
hereby relicense[*] them under the Apache-2.0 with LLVM exception 
license, in the hope they may be useful in Clang's implementation.  I 
also append the current documentation.

Iain and I are discussing whether a separate upstream project, from 
whence both GCC and Clang can sync, may be the best approach.


[*] Contributions to the FSF give back to the contributor a license to 
that code, allowing them to relicense as desired.

Nathan Sidwell

@node C++ Module Mapper
@subsection Module Mapper
@cindex C++ Module Mapper

A module mapper provides a line-based server or file that the
compiler queries to determine the mapping between module names and CMI
files.  It is also used to build CMIs on demand.  A mapper may be
specified with the @option{-fmodule-mapper=@var{val}} option or
@env{CXX_MODULE_MAPPER} environment variable.  The value may have
one of the following forms:

@table @gcctabopt

@item @r{[}@var{hostname}@r{]}:@var{port}@r{[}?@var{ident}@r{]}
An optional hostname and a numeric port number to connect to.  If the
hostname is omitted, the loopback address is used.  If the hostname
corresponds to multiple IPV6 addresses, these are tried in turn, until
one is successful.  If your host lacks ipv6, this form is
non-functional.  If you must use ipv4 @emph{get with the 21st century},
or failing that use @option{-fmodule-mapper='|ncat @var{ipv4host}

@item =@var{socket}@r{[}?@var{ident}@r{]}
A local domain socket.  If your host lacks local domain sockets, this
form is non-functional.

@item |@var{program}@r{[}?@var{ident}@r{]} @r{[}@var{args...}@r{]}
A program to spawn, and communicate with on its stdin/stdout streams.
Your @var{PATH} environment variable is searched for the program.
Arguments are separated by space characters, (it is not possible for
one of the arguments delivered to the program to contain a space).

@item <>@r{[}?@var{ident}@r{]}
@item <>@var{fdinout}@r{[}?@var{ident}@r{]}
@item <@var{fdin}>@var{fdout}@r{[}?@var{ident}@r{]}
File descriptors to communicate over.  The first form, @option{<>},
communicates over stdin and stdout.  The second form specifies a
bidirectional file descriptor and the last form allows specifying
two independent descriptors.  Note that other compiler options might
cause the compiler to read stdin or write stdout.

@item @var{file}@r{[}?@var{ident}@r{]}
A mapping file consisting of space-separated module-name, filename
pairs, one per line.  Only the mappings for the direct imports and any
module export name need be provided.  If other mappings are provided,
they override those stored in any imported CMI files.  A repository
root may be specified in the mapping file by using @samp{$root} as the
module name in the first active line.

@end table

As shown, an optional @var{ident} may suffix the first word of the
option, indicated by a @samp{?} prefix.  The value is used in the
initial handshake with the module server, or to specify a prefix on
mapping file lines.  In the server case, the main source file name is
used if no @var{ident} is specified.  In the file case, all non-blank
lines are significant, unless a value is specified, in which case only
lines beginning with @var{ident} are significant.  The @var{ident}
must be separated by whitespace from the module name.  Be aware that
@samp{<}, @samp{>}, @samp{?}  and @samp{|} characters are often
significant to the shell, and therefore may need quoting.

The mapper is connected to or loaded lazily, when the first module
mapping is required.  The networking protocols are only supported on
hosts that provide networking.  If no mapper is specified a default is

Messages consist of whitespace-separated tokens and a possible final
filename.  As filenames are the last item on a line, they may contain
embeded or trailing spaces without difficulty (they cannot begin with
a space).  All non-ascii characters are expected to be UTF8 encoded.  Each
line is terminated by a line-feed (@code{0xa}) character.  The server
should accept and respond to the following commands:

@table @gcctabopt

@item DONE @var{module}
The compilation has completed the interface of @var{module}.  There is
no response.  It is now safe to read the generated CMI.  Note that the
compilation may not have completed the object-file generation of the
interface unit.

@item EXPORT @var{module}
The compilation is of a module interface unit, and will generate a CMI
for @var{module}.  The response should be @samp{OK @var{cmipath}}.

@item HELLO @var{ver} @var{kind} @var{ident}
This is the first command.  It informs the server of the name of the
source being compiled.  Response is either @samp{HELLO
@var{ver} @var{agent} @var{repopath}}, or @samp{ERROR @var{msg}}.

@item IMPORT @var{module}
A query for an import (including for a module implementation unit).
The response is @samp{OK @var{cmipath}} to indicate a CMI file.  If
the request is not fulfilable, the response is @samp{ERROR
[@var{msg}]}.  Usually an error response will cause compilation to

@item INCLUDE @var{header}
A @code{#include} directive for @var{header} is about to be processed.
The response informs the compiler how to treat the inclusion.  A
response of @samp{TEXT} causes textual inclusion.  A response of
@samp{IMPORT} causes importation as a header unit and a subsequent
@samp{IMPORT} query will then be forthcoming.

@end table

It is recommended that any unrecognized command causes an @samp{ERROR}
response with a suitable message.

Requests and responses may be batched.  If a request line begins with
a @samp{+} character, before waiting for a response another request
should be made.  That too may begin with @samp{+}.  The final request
of the batch should begin with @samp{-}, and may be empty.  Similarly
responses may be batched, both in response to a set of batched
requests.  Each non-ultimate line of a batched response begins with a
@samp{+}.  The final line should begin with @samp{-}, and may
otherwise be empty.  Responses to a batched request are in request
order.  Servers should not commence responses until all requests of a
batch have been received.  There may be a fixed-capacity pipe between
client and server, and sending responses before the client has started
reading could result in deadlock.

The following metavariables were used:

@table @gcctabopt

@item @var{cmipath}
Pathname of a CMI file.

@item @var{from}
The source path of the file containing the import or include.

@item @var{module}
A module name.  Header unit names are absolute pathnames, or
pathnames prefixed with @samp{./}.  Header units are resolved using
the include path.

@item @var{msg}
A human readable message.  This may contain whitespace.

@item @var{ident}
An identity provided when invoking the compiler.  This may be helpful
to distinguish different connections to a common server.

@item @var{ver}
A numeric version number, currently 0.

@end table

A project-specific mapper is expected to be provided by the build
system that invokes the compiler.  It is not expected that a
general-purpose server is provided for all compilations.  As such, the
server will know the build configuration, the compiler it invoked, and
the environment (such as working directory) in which that is
operating.  As it may parallelize builds, several compilations may
connect to the same socket.

When delivering paths to the compiler, paths relative to the a
repository-root directory should be used.  This server informs the
compiler of this root in the initial handshake, using a path relative
to the compiler's working directory, or an absolute one.  Compilers
may embed the path of a direct import CMI file into an output CMI.
This path will be relative to the repository.  Such a path reduces the
server traffic, but requires the build system to recreate the same
directory structure within the repository across a parellelized build

The default mapper generates CMI files in a @samp{gcm.cache}
directory.  CMI files have a @samp{.gcm} suffix.  The module unit name
is used directly to provide the basename.  Header units construct a
relative path using the underlying header file name.  If the path is
already relative, a @samp{!} directory is prepended.  Internal
@samp{..} components are translated to @samp{!!}.  No attempt is made
to canonicalize these filenames beyond that done by the preprocessor's
include search algorithm, as in general it is ambiguous when symbolic
links are present.

The mapper protocol was published as ``A Module Mapper''
@uref{http//wg21.link/p1184}.  It is intended that build systems will
provide their own mappers.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mapper-client.cc
Type: text/x-c++src
Size: 16928 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200518/db4ab82f/attachment-0002.cc>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mapper-client.h
Type: text/x-chdr
Size: 3834 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200518/db4ab82f/attachment-0001.h>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mapper-server.cc
Type: text/x-c++src
Size: 41115 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200518/db4ab82f/attachment-0003.cc>

More information about the cfe-dev mailing list