[cfe-dev] [RFC] C++20 modules dependency discovery
Ben Boeckel via cfe-dev
cfe-dev at lists.llvm.org
Tue Aug 13 13:33:37 PDT 2019
This is likely going to be a bit weird since I just subscribed and don't
have the original email(s) to reply to, so apologies if my
reconstruction is incorrect.
On Mon, Aug 12, 2019 at 18:37:05 PDT, Michael Spencer wrote:
> For explicit modules we only need to know the direct dependencies, as the
> build system will handle the transitive set.
Correct. Though `import` statements in `#include` files still need to be
mentioned.
> For preprocessing we still need to import header units (but only their
> preprocessor state), but not normal modules. For this case it’s ok if `-E
> -MD` fails to find a module. But it does still need to be able to find
> header units and module maps. Additionally the normal Make output syntax
> is not sufficient to represent the needed information unless the driver
> decides how modules and header units should be built and where intermediate
> files should go. There’s currently a json format working its way through
> the tooling subgroup of the standards committee that I think we should
> adopt for this.
>
> I think we need separate modes in clang for these along with support for
> scanning through header units without actually building a clang module for
> them. clang-scan-deps will make use of the explicit mode. The question I
> have is how should we select this mode, and what clang options do we need
> to add?
>
> Proposal
> ========
>
> As a rough idea I propose the following:
>
> * `-M?` means output the json format which can correctly represent
> dependencies on a module for which we don’t know what the final file path
> will be.
[ I'm the author of the paper specifying the mentioned format. ]
For my GCC patch, I've spelled the flags for the output in the following
way:
- `-fdep-format=trtbd`: Necessary to support creating old format
versions (the "trtbd" part is in search of a much better name :) ).
- `-fdep-output=<PATH>`: The path that will be passed to the `-o` flag
when compiling the TU being scanned. This is needed to hook up which
scan result goes with which compilation rule (it can't be associated
with the source because a single source path may be compiled
multiple times within a build; the output object file does need to
be unique however).
- `-fdep-file=<PATH>` where to write the output for the format.
I avoided the `-M` flag family because that means "make". This is not
make syntax, so it doesn't belong there. In addition, the existing `-M`
flags are still useful because the "should I rerun this rule" logic for
the scan step itself can be satisfied with the `-M` flags here.
> * `clang++ -std=c++20 -E -MD -fimplicit-header-units` should implicitly
> find header unit sources, but not modules (as we've not given it any way to
> look up how to build modules).
> * This means that the dep file will contain a bunch of `.h`s,
> `.modulemap`s, and any `.pcm`s explicitly listed on the command line.
> * This also means erroring on unknown imported modules as we don't know
> what to put in the dep file for them.
Sounds reasonable. Matching GCC's output for them might be a viable
option, but that is going to make not-make parsers of the `.d` files
choke (since that output involves appending to make variables).
> * `clang++ -std=c++20 -E -MD -fimplicit-header-units
> -fimplicit-module-lookup=?` should do the same as the above, except that
> it does know how to find modules, and should list all of the transitive
> dependencies of any modules it finds.
> * `clang++ -std=c++20 -E -MD` should fail if it hits a module or header
> unit, and should never do implicit lookup.
> * `clang++ -std=c++20 -E -M?` should scan through header units without
> actually building clang modules for them (to get the macros it needs), and
> should note all module imports.
> * This means that the dep file will contain only `.h`s that it
> includes, and use the json representation of header units and modules.
> * It will also be shallow, with only direct dependencies.
Sounds good.
> Additionally, we should (eventually) make:
>
> `$ clang++ -std=c++20 a.cpp b.cpp c.cpp a.cppm -o program`
>
> Work without a build system, even in the presence of modules. To do this
> we will need to prescan the files to determine the module dependencies
> between them and then build them in dependency order. This does mean
> adding a (simple) build system to the driver (maybe [llbuild](
> https://github.com/apple/swift-llbuild)?), but I think it’s worth it to
> make simple cases simple. It may also make sense to actually push this
> work out to a real build system. For example have clang write a temporary
> ninja file and invoke ninja to perform the build.
This sounds like what a Meson developer is expecting in this blog post:
https://nibblestew.blogspot.com/2019/08/building-c-modules-take-n1.html
I don't know how "simple" they're able to force their compilation model
into what would be provided here. I'm also not sure how much a nested
ninja would be appreciated (there's no notion of a jobserver for
ninja-under-ninja to propagate things like `-l` or `-j` flags down).
Pool information may also be useful there. There is a patchset for
ninja-under-make to obey jobserver information though, but that doesn't
help Meson at all.
On Tue, Aug 13, 2019 at 02:08:42 PDT, Michael Spencer wrote:
> On Tue, Aug 13, 2019 at 01:52:46 PDT, Finkel, Hal J. wrote:
> > I don't object to supporting the json format, but are there defaults
> > that would make sense? Maybe using the preprocessor state implied by
> > the current command-line options and putting intermediate files /
> > interface files in the current directory, or in
> > TMDIR/.clang/<hash of path>, or something else? We'd need defaults
> > for your `-M?` below anyway?
I think that defaults for the `-M?` (or `-fdep-*` flags) is unnecessary.
The flags are only really meaningful to a build system sophisticated
enough to understand module dependencies anyways, so just requiring at
least `-fdep-format=` and `-fdep-file=` to be set sounds OK to me at
least (`-fdep-output=` being unset means the build tool knows what it's
doing I guess). I suppose `-fdep-file=` could have a default too, but
hat sounds like a build system being too trusting of cross-version
compatibility to me.
> The json format doesn't include pcm paths.
It doesn't require them, but there is a slot for the scan tool to say
something. In CMake's implementation, I take the filename of the pcm
path placed there, but relocate it to a target-specific directory. If it
is missing, I create my own filepath based on the logical name of the
module. This is communicated to the actual build by creating a file for
GCC's module mapper to locate it (which is used for import and export
locations). If clang wants a response file, that can be done too (with
the flag just being spelled as `@` instead of `-fmodule-mapper=`).
> It just says which source
> files provide which modules, and what modules and header units each
> source file imports. It's up to the build system to construct an actual
> build.
Yep.
> The other issue with -MD is that I believe tools that use `.d`
> files wouldn't even be able to handle a `.d` that included actual
> commands.
Correct. Ninja tries to handle the barest of syntax for these files
(basically what is seen in the wild).
> > Also, does finding a module involve matching a cppm file with
> > compatible preprocessor state, or is it just by name?
> >
> It's just by name. The assumption here is that you have a compilation
> database or similar and thus know the command line options passed to
> every source file.
In CMake, mismatched preprocessor state is expected to be detected by
the compiler (something like "-D flags change the interpretation of the
BMI") or linker (as `_ITERATOR_DEBUG_LEVEL` is handled in Windows).
--Ben
More information about the cfe-dev
mailing list