[cfe-dev] Modules TS: binary module interface dependencies

Thu Jun 29 16:53:38 PDT 2017

On 29 June 2017 at 04:46, Boris Kolpackov <boris at codesynthesis.com> wrote:

> Richard Smith <richard at metafoo.co.uk> writes:
>
> > We've talked about making this kind of relocation easier by allowing the
> > module source directory and the build directory to be relocated
> > independently (right now you need to relocate everything together --
> > sources, .pcm's, working directory).
>
> That's what I did and got the above-mentioned error.

Hmm, that could well be a bug, then. Do you by any chance have steps to
reproduce this?

> At that point I think you'd be better off with a directory of .pcm files
> > following a naming convention rather than providing the compiler with
> > a (potentially very large) set of mappings (and we already support
> > something like that).
>
> Here is a concrete scenarios I am thinking about: I want to implement
> distributed compilation that supports modules. Which means that
> besides the translation unit itself, the build system also needs
> to ship .pcm's of all the modules that this TU imports (transitively).
>
> In itself, this is not a problem: the build system needs to make sure
> that these .pcm's are all up-to-date before it can invoke the compiler.
> So it got to know the paths to all the .pcm's which, in case of build2,
> are spread out across various project directories (since we try to re-
> use already compiled .pcm from projects that we import).
>
> For distributed compilation we want to minimize the amount of stuff we
> copy back and forth so it makes sense to cache .pcm's on the build
> slaves (the same .pcm is likely to be used by multiple TUs). So on
> the build slave I would store a list of .pcm files, their hashes,
> and their module names. Since the same module can be compiled with
> different options and result in a different .pcm/hash, I would use
> the hash as the file name to store .pcm's on the slave (i.e., content-
> addressable storage).
>
> With this pretty straightforward setup, when time come to compile
> a TU, all I need is to somehow communicate to the compiler the
> mapping of module names to these hash-named .pcm's. If there were
> a way to provide this mapping in a file, I would be all set.
>

For what it's worth, this setup with named symlinks (whose names are stable
across all builds) is how our (Google's) internal build system handles this.

With the directory approach, I would need to create a temporary
> directory and populate it with appropriately-named symlinks (or
> copies in case of Windows) of .pcm files. While not particularly
> hard, it sure feels unnecessary. I would definitely try to avoid
> doing this for local compilations which means I will have two
> different ways of invoking the compiler depending on whether it
> is remote or local.

Because you don't use the content-addressed system locally? What we do is
to use symlinks for remote compilations and just put the files in the
"right" places locally, so the file system looks the same either way.

> And it is still not clear to me how this will
> override embedded .pcm references.

I don't think it would, but if the paths to dependencies are always the
same, you shouldn't need to override any of those references.

> But allowing an explicit mapping to be specified would also be fine
> > if people would actually use that facility.
>
> I will use it in build2. And I am willing to try to implement it.

OK :)

> Our design right now is pretty strongly tied to having loaded all
> > dependency modules before loading a dependent module, though, so
> > we need that complexity somewhere.
>
> I don't think we will need it with the mapping approach: we will have
> a map of module names to file names, probably in HeaderSearchOptions
> next to PrebuiltModulePaths -- in a sense it will be another module
> search mechanism that will be tried before prebuilt paths (in
> HeaderSearch::getModuleFileName()).
>
> This map will be populated before we actually load any modules so
> the order in which one specifies the mapping is not important
> (except for overriding). I will probably need to add some extra
> code to consult this map when resolving embedded .pcm references,
> though.
>
> And we could also keep updating this map when loading modules via
> other means (e.g., with -fmodule-file) which will give us the
> override behavior we discussed earlier (I won't need this
> functionality in build2 but could implement it if others think
> it would useful).
>
> If this sounds reasonable, I can give it a go.

Sure. I think my only remaining concerns are:

1) this is likely to end up with a set of command line arguments that grows
linearly with the total number of modules in the project, and you're likely
to find the build system needs or wants to prune the list down to just the
dependencies anyway
2) we can't do any validation that the command line arguments are
reasonable if the corresponding module is not used (we don't want to stat a
large number of .pcm files if most of them are not going to be used, and
definitely don't want to read the file header to find if it names the right
module)

I don't think (2) is really a big deal, though, since we'll get at least a
"file not found" error if the module is actually used by the compilation.
And (1) is ultimately your problem as the build system maintainer, not
ours. ;-)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170629/8d7f9adc/attachment.html>