[cfe-dev] RFC: Supporting private module maps for non-framework headers

Fri Nov 14 15:31:56 PST 2014

On Fri, Nov 14, 2014 at 3:15 PM, Argyrios Kyrtzidis <kyrtzidis at apple.com>
wrote:

>
> On Nov 13, 2014, at 3:40 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>
>
>> Consider the following:
>>
>> "d.h" is in Dispatch, and "dx.h" is in DispatchWithExtensions.
>> Module M1 says '#include "d.h"'. We implicitly load only the /usr/include
>> module map, find the Dispatch module, and build it, without ever loading
>> the extension.
>> Module M2 says '#include "dx.h"'. That results in us building the
>> DispatchWithExtensions module.
>> Now suppose a translation unit imports M1 and M2. We end up with both
>> .pcm's in the same TU.
>>
>> How does your approach prevent the above from happening?
>>
>>
>> Do you mean that M1 and M2 are built from different translation units
>> with different sets of search paths ? Or are they built with the same
>> search paths ?
>> If they are built with the same search paths then they will both build
>> and depend on the Dispatch[WithExtensions] module file. During header
>> lookup for '#include "d.h”’ we will be parsing the module maps in the
>> search paths, even in the paths that do not contain “d.h” (clang already
>> does this). When we complete the Dispatch module definition it will be a
>> definition that includes the extensions.
>>
>
> Hmm, I had thought that the #include -> module mapping would only look for
> and load module map files from the path containing the #include'd file. It
> seems we do not always perform this optimization today; with your proposed
> change, such an optimization would no longer be correct, and that seems
> unfortunate.
>
> Also, we do perform this optimization in some cases, which are relevant to
> your /usr/include versus /usr/local/include example. Suppose you build with
> -Ifoo -Ibar, and you #include "baz/quux.h". If "foo/baz" does not exist but
> "bar/baz" does, we will look for "bar/baz/module.modulemap" then for
> "bar/module.modulemap". We will not look for "foo/module.modulemap" (nor
> for "foo/baz/module.modulemap").
>
> There are probably restrictions you could add that would help here. For
> instance: if you have include paths X and Y, and your primary module map is
> in X/path/, then your extension module map would need to be in Y/path/.
> Even that isn't really sufficient (in particular if X or Y has module maps
> in directories underneath /path/, or if the module maps can be found under
> multiple include paths, you might still have problems), but it's a good
> start.
>
>
> I think this restriction is fine. If you really want to load a module map
> from the parent or subdir instead because the headers in the 'Y/path’ are
> included in a map there, you can have the module map in 'Y/path’ deferring
> to a map in the parent or subdir.
>
>
> If M1 and M2 are built with different search paths then the translation
>> unit that imports both will rebuild M1 because it doesn’t depend on the
>> expected Dispatch-<hashWithExtensions>.pcm.
>>
>> I think you should choose: either (a) if there's a private extension,
>>> then you somehow guarantee that you only ever build / use the .pcm with
>>> that extension and never mix that with a .pcm built without the extension,
>>> or (b) treat the private extension as a layer on top of the public module.
>>>
>>>> If your Dispatch.Private is simply a layer on top of Dispatch, then
>>>> building them as two separate .pcm files seems like the right choice; it
>>>> keeps your Dispatch module (for want of a better word) modular. If on the
>>>> other hand, you need includes/imports in Dispatch to pull in headers /
>>>> submodules from Dispatch.Private, then one big Dispatch .pcm is probably
>>>> the right answer, and we'd need something like your proposal so we could
>>>> say "here is a Dispatch module that's like the one in /usr/include but
>>>> different in the following ways”.
>>>>
>>>>
>>>> (Whichever of these options we pick, we can make the "@import
>>>> Dispatch.Private" syntax do the right thing.)
>>>>
>>>>
>>>> In both cases we would need to extend module map parsing to allow
>>>> submodule extensions. I think the original proposal accommodates that with
>>>> the possible need that once clang supports separating a top module to
>>>> different pcm files that we may need to control whether there is a combined
>>>> module file or multiple ones.
>>>> Given that separating a top module in multiple pcm files is a rather
>>>> intrusive change and the combined pcm file is sufficient for our needs, I’d
>>>> like to proceed for now with the extending module map functionality which
>>>> will result in one top level .pcm file. Is this reasonable ?
>>>>
>>>
>>> A few questions:
>>>
>>> What are you going to use as the defining module map file (part of the
>>> 'key' used for determining .pcm identity)? Is it the defining module map
>>> for the top-level module or the module map with the extension?
>>>
>>>
>>> The key to determine identity will include both module map files.
>>>
>>> What happens if multiple module maps try to extend the same module?
>>>
>>>
>>> I’d prefer to allow it if they extend with different submodules, and
>>> error if they try to define the same submodule.
>>>
>>> Why do you need a separate module map file name? Why not just put your
>>> extension into the normal module.modulemap file?
>>>
>>>
>>> The reason was to disallow module definitions in the private module map
>>> file:
>>>
>>
>> I don't think you need that if your 'extern module' specifies a path. You
>> just imported a definition of the module; it'd be an error to provide
>> another one.
>>
>>
>> Per my example, it protects against writing a module definition directly
>> and missing any “extern” declaration (we could have “extern module” without
>> a path also trigger an error if you define the extern'ed module in the same
>> map file).
>>
> In general I like that a ‘module.private.modulemap' would immediately
>> indicate that it contains only private extensions to existing modules,
>> while seeing a ‘module.modulemap’ you’d expect to see new module
>> definitions. These can then be present in the same directory
>> ('/usr/local/include/module.private.modulemap’ would contain the extensions
>> while '/usr/local/include/module.modulemap’ would contain new modules that
>> only exist in /usr/local/include).
>>
>> But I don’t think these are very important benefits; do you feel that it
>> is unnecessary complexity ?
>>
>
> If you're going to load both module map files anyway, you should get an
> error on the module redefinition, so in a lot of cases you'd notice you got
> it wrong pretty quickly.
>
>
> To clarify, getting the error is fine, the possible issue arises when you
> just add a module definition in the map, at which point clang thinks it has
> found the module it is looking for and does not continue to look for other
> module maps in subsequent search paths.
>
> If this were a really common use case, the idea of having another kind of
> module map file might be worth considering, but if it's only really to
> allow /usr/local/include to partially override/extend /usr/include modules,
> I think we should go for the simpler approach.
>
>
> Sounds good to me!
>

OK, so I think what we now have is:

extern module X's string literal becomes optional. If it's absent, we will
assume the module can be found in some other module map file.

In addition to "module X {", we can now write "module X.Y {". This declares
an extension of module X, which must have already been declared (via extern
module X) or defined. This will be built as part of building module X.

If we end up somehow loading a .pcm for X without X.Y, and a .pcm for X
with X.Y, into the same build, that is an error (these are conflicting
modules), but we think that won't happen except in pathological cases. And
we can provide errors if we see cases that we know don't work well (like
the module.modulemap in different subdirectories case above).

That all seems reasonable to me. Is there more that you need here to
support your use cases?

> - A module definition inside a private extension will be disallowed. The
>>> rationale is that otherwise it will be a very common mistake for users
>>> to write
>>>
>>> *module.modulemap:*
>>> module Foo {
>>>   <public headers>
>>> }
>>>
>>> *module.private.modulemap:*
>>> module Foo {
>>>   <private headers>
>>> }
>>>
>>> and then be left scratching their heads wondering why things are broken
>>> (things missing, headers included textually, etc.). Being more strict in
>>> private extension maps will be beneficial.
>>>
>>>
>>>
>>>
>>> Also note that 'extern module' takes a string literal pointing to the
>>> module map file defining the module, and it triggers us to recursively load
>>> that module map file; your approach doesn't seem to take this into account.
>>>
>>>
>>> I was aware of this functionality but if I think it is unnecessary
>>> boilerplate and restricted flexibility if we force all the authors to write
>>> module maps like:
>>>
>>> extern module Dispatch "../../../usr/include/dispatch/module.modulemap"
>>> module Dispatch.Private { … }
>>>
>>> The compiler is going to either find the module definition or error
>>> about it, so there’s not much benefit to be hardcoding paths; such
>>> hardcoding will restrict flexibility and make testing more cumbersome.
>>>
>>> I’m not suggesting of course to remove the functionality, only to also
>>> allow "extern module Dispatch”.
>>>
>>> I think when we see
>>>
>>>   extern module Foo "blah"
>>>   module Foo.Bar { ... }
>>>
>>> ... we should parse "blah" before parsing the extension (this is the
>>> natural result of the way the code is currently laid out) rather than
>>> waiting until we hit "blah" on the search path and then adding an extension
>>> to it.
>>>
>>>
>>> Technically we don’t need to wait until we hit “blah". "extern module
>>> Dispatch” can create an ‘incomplete’ Dispatch module, the submodules will
>>> be added and once we parse the module definition, the module will be
>>> ‘complete’ (and module lookup will succeed).
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20141114/11eca20a/attachment.html>