[cfe-dev] Modules TS: binary module interface dependencies

Thu Jun 29 04:46:19 PDT 2017

Richard Smith <richard at metafoo.co.uk> writes:

> We've talked about making this kind of relocation easier by allowing the
> module source directory and the build directory to be relocated
> independently (right now you need to relocate everything together --
> sources, .pcm's, working directory).

That's what I did and got the above-mentioned error.

> At that point I think you'd be better off with a directory of .pcm files
> following a naming convention rather than providing the compiler with
> a (potentially very large) set of mappings (and we already support
> something like that).

Here is a concrete scenarios I am thinking about: I want to implement
distributed compilation that supports modules. Which means that
besides the translation unit itself, the build system also needs
to ship .pcm's of all the modules that this TU imports (transitively).

In itself, this is not a problem: the build system needs to make sure
that these .pcm's are all up-to-date before it can invoke the compiler.
So it got to know the paths to all the .pcm's which, in case of build2,
are spread out across various project directories (since we try to re-
use already compiled .pcm from projects that we import).

For distributed compilation we want to minimize the amount of stuff we
copy back and forth so it makes sense to cache .pcm's on the build
slaves (the same .pcm is likely to be used by multiple TUs). So on
the build slave I would store a list of .pcm files, their hashes,
and their module names. Since the same module can be compiled with
different options and result in a different .pcm/hash, I would use
the hash as the file name to store .pcm's on the slave (i.e., content-
addressable storage).

With this pretty straightforward setup, when time come to compile
a TU, all I need is to somehow communicate to the compiler the
mapping of module names to these hash-named .pcm's. If there were
a way to provide this mapping in a file, I would be all set.

With the directory approach, I would need to create a temporary
directory and populate it with appropriately-named symlinks (or
copies in case of Windows) of .pcm files. While not particularly
hard, it sure feels unnecessary. I would definitely try to avoid
doing this for local compilations which means I will have two
different ways of invoking the compiler depending on whether it
is remote or local. And it is still not clear to me how this will
override embedded .pcm references.

> But allowing an explicit mapping to be specified would also be fine
> if people would actually use that facility.

I will use it in build2. And I am willing to try to implement it.

> Our design right now is pretty strongly tied to having loaded all
> dependency modules before loading a dependent module, though, so
> we need that complexity somewhere.

I don't think we will need it with the mapping approach: we will have
a map of module names to file names, probably in HeaderSearchOptions
next to PrebuiltModulePaths -- in a sense it will be another module
search mechanism that will be tried before prebuilt paths (in
HeaderSearch::getModuleFileName()).

This map will be populated before we actually load any modules so
the order in which one specifies the mapping is not important
(except for overriding). I will probably need to add some extra
code to consult this map when resolving embedded .pcm references,
though.

And we could also keep updating this map when loading modules via
other means (e.g., with -fmodule-file) which will give us the
override behavior we discussed earlier (I won't need this
functionality in build2 but could implement it if others think
it would useful).

If this sounds reasonable, I can give it a go.

Thanks,
Boris