[cfe-dev] C++20 module protocol
David Blaikie via cfe-dev
cfe-dev at lists.llvm.org
Tue May 19 11:48:19 PDT 2020
Thanks for designing/working on/contributing this! (I think it's a rather
neat thing & do hope it takes off/becomes adopted as the solution for this
complicated new compiler surface area required by C++20 modules)
On Tue, May 19, 2020 at 12:48 AM Nathan Sidwell via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
> Hi,
> these files are the GCC implementation of the p1184 (wg21.link/p1184)
> protocol. Although part of GCC, they are entirely authored by me, so I
> hereby relicense[*] them under the Apache-2.0 with LLVM exception
> license, in the hope they may be useful in Clang's implementation. I
> also append the current documentation.
>
> Iain and I are discussing whether a separate upstream project, from
> whence both GCC and Clang can sync, may be the best approach.
>
> nathan
>
> [*] Contributions to the FSF give back to the contributor a license to
> that code, allowing them to relicense as desired.
>
> --
> Nathan Sidwell
>
> @node C++ Module Mapper
> @subsection Module Mapper
> @cindex C++ Module Mapper
>
> A module mapper provides a line-based server or file that the
> compiler queries to determine the mapping between module names and CMI
> files. It is also used to build CMIs on demand. A mapper may be
> specified with the @option{-fmodule-mapper=@var{val}} option or
> @env{CXX_MODULE_MAPPER} environment variable. The value may have
> one of the following forms:
>
> @table @gcctabopt
>
> @item @r{[}@var{hostname}@r{]}:@var{port}@r{[}?@var{ident}@r{]}
> An optional hostname and a numeric port number to connect to. If the
> hostname is omitted, the loopback address is used. If the hostname
> corresponds to multiple IPV6 addresses, these are tried in turn, until
> one is successful. If your host lacks ipv6, this form is
> non-functional. If you must use ipv4 @emph{get with the 21st century},
> or failing that use @option{-fmodule-mapper='|ncat @var{ipv4host}
> @var{port}'}.
>
> @item =@var{socket}@r{[}?@var{ident}@r{]}
> A local domain socket. If your host lacks local domain sockets, this
> form is non-functional.
>
> @item |@var{program}@r{[}?@var{ident}@r{]} @r{[}@var{args...}@r{]}
> A program to spawn, and communicate with on its stdin/stdout streams.
> Your @var{PATH} environment variable is searched for the program.
> Arguments are separated by space characters, (it is not possible for
> one of the arguments delivered to the program to contain a space).
>
> @item <>@r{[}?@var{ident}@r{]}
> @item <>@var{fdinout}@r{[}?@var{ident}@r{]}
> @item <@var{fdin}>@var{fdout}@r{[}?@var{ident}@r{]}
> File descriptors to communicate over. The first form, @option{<>},
> communicates over stdin and stdout. The second form specifies a
> bidirectional file descriptor and the last form allows specifying
> two independent descriptors. Note that other compiler options might
> cause the compiler to read stdin or write stdout.
>
> @item @var{file}@r{[}?@var{ident}@r{]}
> A mapping file consisting of space-separated module-name, filename
> pairs, one per line. Only the mappings for the direct imports and any
> module export name need be provided. If other mappings are provided,
> they override those stored in any imported CMI files. A repository
> root may be specified in the mapping file by using @samp{$root} as the
> module name in the first active line.
>
> @end table
>
> As shown, an optional @var{ident} may suffix the first word of the
> option, indicated by a @samp{?} prefix. The value is used in the
> initial handshake with the module server, or to specify a prefix on
> mapping file lines. In the server case, the main source file name is
> used if no @var{ident} is specified. In the file case, all non-blank
> lines are significant, unless a value is specified, in which case only
> lines beginning with @var{ident} are significant. The @var{ident}
> must be separated by whitespace from the module name. Be aware that
> @samp{<}, @samp{>}, @samp{?} and @samp{|} characters are often
> significant to the shell, and therefore may need quoting.
>
> The mapper is connected to or loaded lazily, when the first module
> mapping is required. The networking protocols are only supported on
> hosts that provide networking. If no mapper is specified a default is
> provided.
>
> Messages consist of whitespace-separated tokens and a possible final
> filename. As filenames are the last item on a line, they may contain
> embeded or trailing spaces without difficulty (they cannot begin with
> a space). All non-ascii characters are expected to be UTF8 encoded. Each
> line is terminated by a line-feed (@code{0xa}) character. The server
> should accept and respond to the following commands:
>
> @table @gcctabopt
>
> @item DONE @var{module}
> The compilation has completed the interface of @var{module}. There is
> no response. It is now safe to read the generated CMI. Note that the
> compilation may not have completed the object-file generation of the
> interface unit.
>
> @item EXPORT @var{module}
> The compilation is of a module interface unit, and will generate a CMI
> for @var{module}. The response should be @samp{OK @var{cmipath}}.
>
> @item HELLO @var{ver} @var{kind} @var{ident}
> This is the first command. It informs the server of the name of the
> source being compiled. Response is either @samp{HELLO
> @var{ver} @var{agent} @var{repopath}}, or @samp{ERROR @var{msg}}.
>
> @item IMPORT @var{module}
> A query for an import (including for a module implementation unit).
> The response is @samp{OK @var{cmipath}} to indicate a CMI file. If
> the request is not fulfilable, the response is @samp{ERROR
> [@var{msg}]}. Usually an error response will cause compilation to
> terminate.
>
> @item INCLUDE @var{header}
> A @code{#include} directive for @var{header} is about to be processed.
> The response informs the compiler how to treat the inclusion. A
> response of @samp{TEXT} causes textual inclusion. A response of
> @samp{IMPORT} causes importation as a header unit and a subsequent
> @samp{IMPORT} query will then be forthcoming.
>
> @end table
>
> It is recommended that any unrecognized command causes an @samp{ERROR}
> response with a suitable message.
>
> Requests and responses may be batched. If a request line begins with
> a @samp{+} character, before waiting for a response another request
> should be made. That too may begin with @samp{+}. The final request
> of the batch should begin with @samp{-}, and may be empty. Similarly
> responses may be batched, both in response to a set of batched
> requests. Each non-ultimate line of a batched response begins with a
> @samp{+}. The final line should begin with @samp{-}, and may
> otherwise be empty. Responses to a batched request are in request
> order. Servers should not commence responses until all requests of a
> batch have been received. There may be a fixed-capacity pipe between
> client and server, and sending responses before the client has started
> reading could result in deadlock.
>
> The following metavariables were used:
>
> @table @gcctabopt
>
> @item @var{cmipath}
> Pathname of a CMI file.
>
> @item @var{from}
> The source path of the file containing the import or include.
>
> @item @var{module}
> A module name. Header unit names are absolute pathnames, or
> pathnames prefixed with @samp{./}. Header units are resolved using
> the include path.
>
> @item @var{msg}
> A human readable message. This may contain whitespace.
>
> @item @var{ident}
> An identity provided when invoking the compiler. This may be helpful
> to distinguish different connections to a common server.
>
> @item @var{ver}
> A numeric version number, currently 0.
>
> @end table
>
> A project-specific mapper is expected to be provided by the build
> system that invokes the compiler. It is not expected that a
> general-purpose server is provided for all compilations. As such, the
> server will know the build configuration, the compiler it invoked, and
> the environment (such as working directory) in which that is
> operating. As it may parallelize builds, several compilations may
> connect to the same socket.
>
> When delivering paths to the compiler, paths relative to the a
> repository-root directory should be used. This server informs the
> compiler of this root in the initial handshake, using a path relative
> to the compiler's working directory, or an absolute one. Compilers
> may embed the path of a direct import CMI file into an output CMI.
> This path will be relative to the repository. Such a path reduces the
> server traffic, but requires the build system to recreate the same
> directory structure within the repository across a parellelized build
> system.
>
> The default mapper generates CMI files in a @samp{gcm.cache}
> directory. CMI files have a @samp{.gcm} suffix. The module unit name
> is used directly to provide the basename. Header units construct a
> relative path using the underlying header file name. If the path is
> already relative, a @samp{!} directory is prepended. Internal
> @samp{..} components are translated to @samp{!!}. No attempt is made
> to canonicalize these filenames beyond that done by the preprocessor's
> include search algorithm, as in general it is ambiguous when symbolic
> links are present.
>
> The mapper protocol was published as ``A Module Mapper''
> @uref{http//wg21.link/p1184}. It is intended that build systems will
> provide their own mappers.
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200519/4606f358/attachment.html>
More information about the cfe-dev
mailing list