[cfe-dev] [RFC] C++20 modules dependency discovery

Michael Spencer via cfe-dev cfe-dev at lists.llvm.org
Tue Aug 13 13:49:47 PDT 2019


On Tue, Aug 13, 2019 at 1:33 PM Ben Boeckel <ben.boeckel at kitware.com> wrote:

> This is likely going to be a bit weird since I just subscribed and don't
> have the original email(s) to reply to, so apologies if my
> reconstruction is incorrect.
>
> On Mon, Aug 12, 2019 at 18:37:05 PDT, Michael Spencer wrote:
> > For explicit modules we only need to know the direct dependencies, as the
> > build system will handle the transitive set.
>
> Correct. Though `import` statements in `#include` files still need to be
> mentioned.
>
> > For preprocessing we still need to import header units (but only their
> > preprocessor state), but not normal modules.  For this case it’s ok if
> `-E
> > -MD` fails to find a module.  But it does still need to be able to find
> > header units and module maps.  Additionally the normal Make output syntax
> > is not sufficient to represent the needed information unless the driver
> > decides how modules and header units should be built and where
> intermediate
> > files should go.  There’s currently a json format working its way through
> > the tooling subgroup of the standards committee that I think we should
> > adopt for this.
> >
> > I think we need separate modes in clang for these along with support for
> > scanning through header units without actually building a clang module
> for
> > them. clang-scan-deps will make use of the explicit mode.  The question I
> > have is how should we select this mode, and what clang options do we need
> > to add?
> >
> > Proposal
> > ========
> >
> > As a rough idea I propose the following:
> >
> > * `-M?` means output the json format which can correctly represent
> > dependencies on a module for which we don’t know what the final file path
> > will be.
>
> [ I'm the author of the paper specifying the mentioned format. ]
>
> For my GCC patch, I've spelled the flags for the output in the following
> way:
>
>   - `-fdep-format=trtbd`: Necessary to support creating old format
>     versions (the "trtbd" part is in search of a much better name :) ).
>   - `-fdep-output=<PATH>`: The path that will be passed to the `-o` flag
>     when compiling the TU being scanned. This is needed to hook up which
>     scan result goes with which compilation rule (it can't be associated
>     with the source because a single source path may be compiled
>     multiple times within a build; the output object file does need to
>     be unique however).
>   - `-fdep-file=<PATH>` where to write the output for the format.
>
> I avoided the `-M` flag family because that means "make". This is not
> make syntax, so it doesn't belong there. In addition, the existing `-M`
> flags are still useful because the "should I rerun this rule" logic for
> the scan step itself can be satisfied with the `-M` flags here.
>

This is not something I had considered.  I agree it's highly useful to be
able to not rescan if nothing changed.  It's also important that clang uses
the same flags as gcc here, have you heard from the GCC devs on your GCC
patch?


>
> > * `clang++ -std=c++20 -E -MD -fimplicit-header-units` should implicitly
> > find header unit sources, but not modules (as we've not given it any way
> to
> > look up how to build modules).
> >     * This means that the dep file will contain a bunch of `.h`s,
> > `.modulemap`s, and any `.pcm`s explicitly listed on the command line.
> >     * This also means erroring on unknown imported modules as we don't
> know
> > what to put in the dep file for them.
>
> Sounds reasonable. Matching GCC's output for them might be a viable
> option, but that is going to make not-make parsers of the `.d` files
> choke (since that output involves appending to make variables).
>

What output do you do for GCC?


>
> > * `clang++ -std=c++20 -E -MD -fimplicit-header-units
> > -fimplicit-module-lookup=?`  should do the same as the above, except that
> > it does know how to find modules, and should list all of the transitive
> > dependencies of any modules it finds.
> > * `clang++ -std=c++20 -E -MD` should fail if it hits a module or header
> > unit, and should never do implicit lookup.
> > * `clang++ -std=c++20 -E -M?` should scan through header units without
> > actually building clang modules for them (to get the macros it needs),
> and
> > should note all module imports.
> >     * This means that the dep file will contain only `.h`s that it
> > includes, and use the json representation of header units and modules.
> >     * It will also be shallow, with only direct dependencies.
>
> Sounds good.
>
> > Additionally, we should (eventually) make:
> >
> > `$ clang++ -std=c++20 a.cpp b.cpp c.cpp a.cppm -o program`
> >
> > Work without a build system, even in the presence of modules.  To do this
> > we will need to prescan the files to determine the module dependencies
> > between them and then build them in dependency order.  This does mean
> > adding a (simple) build system to the driver (maybe [llbuild](
> > https://github.com/apple/swift-llbuild)?), but I think it’s worth it to
> > make simple cases simple.  It may also make sense to actually push this
> > work out to a real build system.  For example have clang write a
> temporary
> > ninja file and invoke ninja to perform the build.
>
> This sounds like what a Meson developer is expecting in this blog post:
>
>
> https://nibblestew.blogspot.com/2019/08/building-c-modules-take-n1.html


It seems similar, but the intent isn't really for "real" builds.  It's just
to support simple cases so that step one of using C++ isn't setting up a
build system.


>
> I don't know how "simple" they're able to force their compilation model
> into what would be provided here. I'm also not sure how much a nested
> ninja would be appreciated (there's no notion of a jobserver for
> ninja-under-ninja to propagate things like `-l` or `-j` flags down).
> Pool information may also be useful there. There is a patchset for
> ninja-under-make to obey jobserver information though, but that doesn't
> help Meson at all.
>
> On Tue, Aug 13, 2019 at 02:08:42 PDT, Michael Spencer wrote:
> > On Tue, Aug 13, 2019 at  01:52:46 PDT, Finkel, Hal J. wrote:
> > > I don't object to supporting the json format, but are there defaults
> > > that would make sense? Maybe using the preprocessor state implied by
> > > the current command-line options and putting intermediate files /
> > > interface files in the current directory, or in
> > > TMDIR/.clang/<hash of path>, or something else? We'd need defaults
> > > for your `-M?` below anyway?
>
> I think that defaults for the `-M?` (or `-fdep-*` flags) is unnecessary.
> The flags are only really meaningful to a build system sophisticated
> enough to understand module dependencies anyways, so just requiring at
> least `-fdep-format=` and `-fdep-file=` to be set sounds OK to me at
> least (`-fdep-output=` being unset means the build tool knows what it's
> doing I guess). I suppose `-fdep-file=` could have a default too, but
> hat sounds like a build system being too trusting of cross-version
> compatibility to me.
>
> > The json format doesn't include pcm paths.
>
> It doesn't require them, but there is a slot for the scan tool to say
> something. In CMake's implementation, I take the filename of the pcm
> path placed there, but relocate it to a target-specific directory. If it
> is missing, I create my own filepath based on the logical name of the
> module. This is communicated to the actual build by creating a file for
> GCC's module mapper to locate it (which is used for import and export
> locations). If clang wants a response file, that can be done too (with
> the flag just being spelled as `@` instead of `-fmodule-mapper=`).
>
> > It just says which source
> > files provide which modules, and what modules and header units each
> > source file imports.  It's up to the build system to construct an actual
> > build.
>
> Yep.
>
> > The other issue with -MD is that I believe tools that use `.d`
> > files wouldn't even be able to handle a `.d` that included actual
> > commands.
>
> Correct. Ninja tries to handle the barest of syntax for these files
> (basically what is seen in the wild).
>

This makes me think we really shouldn't even try to do that then.

- Michael Spencer


>
> > > Also, does finding a module involve matching a cppm file with
> > > compatible preprocessor state, or is it just by name?
> > >
> > It's just by name.  The assumption here is that you have a compilation
> > database or similar and thus know the command line options passed to
> > every source file.
>
> In CMake, mismatched preprocessor state is expected to be detected by
> the compiler (something like "-D flags change the interpretation of the
> BMI") or linker (as `_ITERATOR_DEBUG_LEVEL` is handled in Windows).
>
> --Ben
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190813/a364091d/attachment.html>


More information about the cfe-dev mailing list