[cfe-dev] C++20 module protocol

Chris Lattner via cfe-dev cfe-dev at lists.llvm.org
Tue May 19 12:54:57 PDT 2020

Very nice Nathan!  Thank you for helping to foster cross-compiler collaboration,


> On May 18, 2020, at 8:36 AM, Nathan Sidwell via cfe-dev <cfe-dev at lists.llvm.org> wrote:
> Hi,
> these files are the GCC implementation of the p1184 (wg21.link/p1184) protocol.  Although part of GCC, they are entirely authored by me, so I hereby relicense[*] them under the Apache-2.0 with LLVM exception license, in the hope they may be useful in Clang's implementation.  I also append the current documentation.
> Iain and I are discussing whether a separate upstream project, from whence both GCC and Clang can sync, may be the best approach.
> nathan
> [*] Contributions to the FSF give back to the contributor a license to that code, allowing them to relicense as desired.
> -- 
> Nathan Sidwell
> @node C++ Module Mapper
> @subsection Module Mapper
> @cindex C++ Module Mapper
> A module mapper provides a line-based server or file that the
> compiler queries to determine the mapping between module names and CMI
> files.  It is also used to build CMIs on demand.  A mapper may be
> specified with the @option{-fmodule-mapper=@var{val}} option or
> @env{CXX_MODULE_MAPPER} environment variable.  The value may have
> one of the following forms:
> @table @gcctabopt
> @item @r{[}@var{hostname}@r{]}:@var{port}@r{[}?@var{ident}@r{]}
> An optional hostname and a numeric port number to connect to.  If the
> hostname is omitted, the loopback address is used.  If the hostname
> corresponds to multiple IPV6 addresses, these are tried in turn, until
> one is successful.  If your host lacks ipv6, this form is
> non-functional.  If you must use ipv4 @emph{get with the 21st century},
> or failing that use @option{-fmodule-mapper='|ncat @var{ipv4host}
> @var{port}'}.
> @item =@var{socket}@r{[}?@var{ident}@r{]}
> A local domain socket.  If your host lacks local domain sockets, this
> form is non-functional.
> @item |@var{program}@r{[}?@var{ident}@r{]} @r{[}@var{args...}@r{]}
> A program to spawn, and communicate with on its stdin/stdout streams.
> Your @var{PATH} environment variable is searched for the program.
> Arguments are separated by space characters, (it is not possible for
> one of the arguments delivered to the program to contain a space).
> @item <>@r{[}?@var{ident}@r{]}
> @item <>@var{fdinout}@r{[}?@var{ident}@r{]}
> @item <@var{fdin}>@var{fdout}@r{[}?@var{ident}@r{]}
> File descriptors to communicate over.  The first form, @option{<>},
> communicates over stdin and stdout.  The second form specifies a
> bidirectional file descriptor and the last form allows specifying
> two independent descriptors.  Note that other compiler options might
> cause the compiler to read stdin or write stdout.
> @item @var{file}@r{[}?@var{ident}@r{]}
> A mapping file consisting of space-separated module-name, filename
> pairs, one per line.  Only the mappings for the direct imports and any
> module export name need be provided.  If other mappings are provided,
> they override those stored in any imported CMI files.  A repository
> root may be specified in the mapping file by using @samp{$root} as the
> module name in the first active line.
> @end table
> As shown, an optional @var{ident} may suffix the first word of the
> option, indicated by a @samp{?} prefix.  The value is used in the
> initial handshake with the module server, or to specify a prefix on
> mapping file lines.  In the server case, the main source file name is
> used if no @var{ident} is specified.  In the file case, all non-blank
> lines are significant, unless a value is specified, in which case only
> lines beginning with @var{ident} are significant.  The @var{ident}
> must be separated by whitespace from the module name.  Be aware that
> @samp{<}, @samp{>}, @samp{?}  and @samp{|} characters are often
> significant to the shell, and therefore may need quoting.
> The mapper is connected to or loaded lazily, when the first module
> mapping is required.  The networking protocols are only supported on
> hosts that provide networking.  If no mapper is specified a default is
> provided.
> Messages consist of whitespace-separated tokens and a possible final
> filename.  As filenames are the last item on a line, they may contain
> embeded or trailing spaces without difficulty (they cannot begin with
> a space).  All non-ascii characters are expected to be UTF8 encoded.  Each
> line is terminated by a line-feed (@code{0xa}) character.  The server
> should accept and respond to the following commands:
> @table @gcctabopt
> @item DONE @var{module}
> The compilation has completed the interface of @var{module}.  There is
> no response.  It is now safe to read the generated CMI.  Note that the
> compilation may not have completed the object-file generation of the
> interface unit.
> @item EXPORT @var{module}
> The compilation is of a module interface unit, and will generate a CMI
> for @var{module}.  The response should be @samp{OK @var{cmipath}}.
> @item HELLO @var{ver} @var{kind} @var{ident}
> This is the first command.  It informs the server of the name of the
> source being compiled.  Response is either @samp{HELLO
> @var{ver} @var{agent} @var{repopath}}, or @samp{ERROR @var{msg}}.
> @item IMPORT @var{module}
> A query for an import (including for a module implementation unit).
> The response is @samp{OK @var{cmipath}} to indicate a CMI file.  If
> the request is not fulfilable, the response is @samp{ERROR
> [@var{msg}]}.  Usually an error response will cause compilation to
> terminate.
> @item INCLUDE @var{header}
> A @code{#include} directive for @var{header} is about to be processed.
> The response informs the compiler how to treat the inclusion.  A
> response of @samp{TEXT} causes textual inclusion.  A response of
> @samp{IMPORT} causes importation as a header unit and a subsequent
> @samp{IMPORT} query will then be forthcoming.
> @end table
> It is recommended that any unrecognized command causes an @samp{ERROR}
> response with a suitable message.
> Requests and responses may be batched.  If a request line begins with
> a @samp{+} character, before waiting for a response another request
> should be made.  That too may begin with @samp{+}.  The final request
> of the batch should begin with @samp{-}, and may be empty.  Similarly
> responses may be batched, both in response to a set of batched
> requests.  Each non-ultimate line of a batched response begins with a
> @samp{+}.  The final line should begin with @samp{-}, and may
> otherwise be empty.  Responses to a batched request are in request
> order.  Servers should not commence responses until all requests of a
> batch have been received.  There may be a fixed-capacity pipe between
> client and server, and sending responses before the client has started
> reading could result in deadlock.
> The following metavariables were used:
> @table @gcctabopt
> @item @var{cmipath}
> Pathname of a CMI file.
> @item @var{from}
> The source path of the file containing the import or include.
> @item @var{module}
> A module name.  Header unit names are absolute pathnames, or
> pathnames prefixed with @samp{./}.  Header units are resolved using
> the include path.
> @item @var{msg}
> A human readable message.  This may contain whitespace.
> @item @var{ident}
> An identity provided when invoking the compiler.  This may be helpful
> to distinguish different connections to a common server.
> @item @var{ver}
> A numeric version number, currently 0.
> @end table
> A project-specific mapper is expected to be provided by the build
> system that invokes the compiler.  It is not expected that a
> general-purpose server is provided for all compilations.  As such, the
> server will know the build configuration, the compiler it invoked, and
> the environment (such as working directory) in which that is
> operating.  As it may parallelize builds, several compilations may
> connect to the same socket.
> When delivering paths to the compiler, paths relative to the a
> repository-root directory should be used.  This server informs the
> compiler of this root in the initial handshake, using a path relative
> to the compiler's working directory, or an absolute one.  Compilers
> may embed the path of a direct import CMI file into an output CMI.
> This path will be relative to the repository.  Such a path reduces the
> server traffic, but requires the build system to recreate the same
> directory structure within the repository across a parellelized build
> system.
> The default mapper generates CMI files in a @samp{gcm.cache}
> directory.  CMI files have a @samp{.gcm} suffix.  The module unit name
> is used directly to provide the basename.  Header units construct a
> relative path using the underlying header file name.  If the path is
> already relative, a @samp{!} directory is prepended.  Internal
> @samp{..} components are translated to @samp{!!}.  No attempt is made
> to canonicalize these filenames beyond that done by the preprocessor's
> include search algorithm, as in general it is ambiguous when symbolic
> links are present.
> The mapper protocol was published as ``A Module Mapper''
> @uref{http//wg21.link/p1184}.  It is intended that build systems will
> provide their own mappers.
> <mapper-client.cc><mapper-client.h><mapper-server.cc>_______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

More information about the cfe-dev mailing list