[cfe-dev] RFC: Supporting private module maps for non-framework headers

Thu Nov 13 15:40:08 PST 2014

On Wed, Nov 12, 2014 at 10:30 PM, Argyrios Kyrtzidis <kyrtzidis at apple.com>
wrote:

>
> On Nov 12, 2014, at 7:54 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>
> On Wed, Nov 12, 2014 at 7:24 PM, Argyrios Kyrtzidis <kyrtzidis at apple.com>
> wrote:
>
>>
>> On Nov 12, 2014, at 6:42 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>>
>> On Wed, Nov 12, 2014 at 2:53 PM, Argyrios Kyrtzidis <kyrtzidis at apple.com>
>>  wrote:
>>
>>>
>>> On Nov 12, 2014, at 2:34 PM, Richard Smith <richard at metafoo.co.uk>
>>> wrote:
>>>
>>> On Tue, Nov 11, 2014 at 8:04 PM, Argyrios Kyrtzidis <kyrtzidis at apple.com
>>> > wrote:
>>>
>>>>
>>>> On Nov 11, 2014, at 7:38 PM, Richard Smith <richard at metafoo.co.uk>
>>>> wrote:
>>>>
>>>> On Tue, Nov 11, 2014 at 6:49 PM, Argyrios Kyrtzidis <
>>>> kyrtzidis at apple.com> wrote:
>>>>
>>>>>
>>>>> On Nov 11, 2014, at 6:37 PM, Richard Smith <richard at metafoo.co.uk>
>>>>> wrote:
>>>>>
>>>>> On Tue, Nov 11, 2014 at 3:45 PM, Argyrios Kyrtzidis <
>>>>> kyrtzidis at apple.com> wrote:
>>>>>
>>>>>>
>>>>>> On Nov 11, 2014, at 12:34 PM, Richard Smith <richard at metafoo.co.uk>
>>>>>> wrote:
>>>>>>
>>>>>> On Tue, Nov 11, 2014 at 10:00 AM, Argyrios Kyrtzidis <
>>>>>> kyrtzidis at apple.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> On Nov 10, 2014, at 7:48 PM, Richard Smith <richard at metafoo.co.uk>
>>>>>>> wrote:
>>>>>>>
>>>>>>> On Mon, Nov 10, 2014 at 4:00 PM, Argyrios Kyrtzidis <
>>>>>>> kyrtzidis at apple.com> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> For frameworks Clang currently supports adding a separate module
>>>>>>>> map file for the private headers of the framework. It looks specifically
>>>>>>>> for the presence of ‘module.private.modulemap’ inside the .framework and
>>>>>>>> parses both the public and the private module maps when it processes its
>>>>>>>> module. We would like to extend support for private module maps for
>>>>>>>> non-framework headers as well.
>>>>>>>>
>>>>>>>> In the Darwin platform, the public SDK headers are located in
>>>>>>>> '/usr/include', while the associated private SDK headers are located in
>>>>>>>> '/usr/local/include’. '/usr/local/include’ comes before '/usr/include’ in
>>>>>>>> the header search paths.
>>>>>>>>
>>>>>>>
>>>>>>> I worry that this will be fragile. If for any reason we look in
>>>>>>> /usr/include but not in /usr/local/include, we'll not load the private
>>>>>>> extension map and things will probably go quite badly from that point
>>>>>>> onwards. If the presence of the /usr/local/include headers is a fundamental
>>>>>>> part of a /usr/include module, then it seems better to me to specify that
>>>>>>> within the /usr/include module map.
>>>>>>>
>>>>>>> So here's one possibility: allow 'extern module' declarations to be
>>>>>>> nested within other modules, then write your /usr/include module map as:
>>>>>>>
>>>>>>> module MyModule {
>>>>>>>   <...>
>>>>>>>   extern module SomethingPrivate
>>>>>>> "/usr/local/include/module.private.map"
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> This has drawbacks:
>>>>>>>
>>>>>>> - Details of the private SDK, “leak out” to the public one. It
>>>>>>> should work similar to frameworks, in that the public SDK remains the same
>>>>>>> irrespective if there is or not a private API, and the private API is a
>>>>>>> straight addition on top of the public one without needing to modify
>>>>>>> something in the public SDK.
>>>>>>> - It is a bit weak as guarantee anyway because the public module map
>>>>>>> must necessarily function even when the extension map is missing, which
>>>>>>> means pointing at the wrong path or missing the private map when you really
>>>>>>> need it will not be detected.
>>>>>>> - Flexibility to extend a module from any path may be valuable for
>>>>>>> testing.
>>>>>>>
>>>>>>
>>>>>> OK, I'm not sure I understand what problem you're solving. If the
>>>>>> /usr/local/include stuff works as a layer on top of /usr/include, why do
>>>>>> you need them to be built as part of the same module? (Do your
>>>>>> /usr/local/include headers override / #include_next some of the
>>>>>> /usr/include headers, perhaps? If so, do you need the #includes in
>>>>>> /usr/include to find the /usr/local/include headers rather than the
>>>>>> /usr/include headers?)
>>>>>>
>>>>>>
>>>>>> There are some cases of cycles between public/private headers which
>>>>>> would be accommodated by a single module build but the primary motivation
>>>>>> is that we would like the module public/private interface to be under the
>>>>>> same namespace, so you’d do
>>>>>>
>>>>>> @import Dispatch;
>>>>>> @import Dispatch.Private;
>>>>>>
>>>>>> @import Darwin.POSIX.Foo.Bar;
>>>>>> @import Darwin.POSIX.Foo.Bar.Private;
>>>>>>
>>>>>> and generally any kind of private extension:
>>>>>>
>>>>>> @import Dispatch.SuperCoolButPrivate;
>>>>>>
>>>>>
>>>>> Do you want / need them to be built as a single module file, or not?
>>>>>
>>>>>
>>>>> As I said, cycles may make things difficult for separate module files,
>>>>> but how are we going to get new submodules under the same module name with
>>>>> separate module files ?
>>>>>
>>>>
>>>> Well, the restriction that module files correspond to top-level module
>>>> names is arbitrary and artificial. (It's also a bad idea for a few reasons.
>>>> It makes incremental refactoring very hard, for instance, because you're
>>>> required to have no cycles at any point between things in different
>>>> top-level modules.)
>>>>
>>>> Splitting up the description of how to build a module file across
>>>> various module maps seems like a very error-prone strategy, especially if
>>>> you're intending to be able to stop looking before you've read all of the
>>>> relevant module maps.
>>>>
>>>>
>>>> I think that the high level parts of my proposal are not dependent on
>>>> whether we build one .pcm file or multiple ones, this is an implementation
>>>> detail.
>>>> To be more specific, if we have
>>>>
>>>> *module.private.modulemap *(extension):
>>>> extern module Dispatch
>>>> module Dispatch.Private {
>>>>   <headers>
>>>> }
>>>>
>>>> *module.modulemap:*
>>>> module Dispatch {
>>>>   <headers>
>>>> }
>>>>
>>>> It is an implementation detail whether we buiild one Dispatch.pcm file
>>>> or a Dispatch.Private.pcm file that depends on another Dispatch.pcm; it
>>>> should make no difference on user code.
>>>> Is this incorrect ?
>>>>
>>>
>>> Whether we build one .pcm or multiple is observable in some
>>> circumstances. 1) We concatenate together all the header files built as
>>> part of one .pcm file, and parse them all at once, and that is not always
>>> semantically equivalent to building them in two separate passes. 2) If you
>>> have one big Dispatch.pcm file which also contains the private bits, and by
>>> any sequence of events you end up also pulling in another Dispatch.pcm that
>>> contains the public headers but not the private ones, you may get ambiguity
>>> errors. 3) We do not allow circular references across .pcm files but do
>>> allow them within a .pcm file.
>>>
>>>
>>> (2) is a bug.
>>>
>>
>> Perhaps. Suppose I put this in a header:
>>
>> extern struct {
>>   int a, b;
>> } s;
>>
>> If that ends up in two different translation units, do I have an error?
>> In C++, the answer is "yes", because the types are different; in C, it's
>> "no", because the types are compatible. Extending this to the modules
>> world, if I have the above in two different modules, should they be
>> mergeable?
>>
>> With the "bottom-up modularization" approach that Apple has been taking
>> so far, I expect you'll find that merging a Dispatch.pcm and a
>> DispatchWithExtensions.pcm together won't work very well. (Ironically, I'd
>> have more confidence this would work if you were using C++ rather than C,
>> because the merging story is more developed and better tested there.) And
>> even if it does work, the possibility of having two different modules
>> providing the same interface as part of the same translation unit seems
>> like a bad idea.
>>
>>
>>  Dispatch.pcm and Dispatch[WithExtensions].pcm will never show up in the
>> same translation unit, they will have different identities; similar to how
>> module maps from different paths result in different pcm files for the same
>> module, and they are not mixed together in the same translation unit.
>>
>
> Consider the following:
>
> "d.h" is in Dispatch, and "dx.h" is in DispatchWithExtensions.
> Module M1 says '#include "d.h"'. We implicitly load only the /usr/include
> module map, find the Dispatch module, and build it, without ever loading
> the extension.
> Module M2 says '#include "dx.h"'. That results in us building the
> DispatchWithExtensions module.
> Now suppose a translation unit imports M1 and M2. We end up with both
> .pcm's in the same TU.
>
> How does your approach prevent the above from happening?
>
>
> Do you mean that M1 and M2 are built from different translation units with
> different sets of search paths ? Or are they built with the same search
> paths ?
> If they are built with the same search paths then they will both build and
> depend on the Dispatch[WithExtensions] module file. During header lookup
> for '#include "d.h”’ we will be parsing the module maps in the search
> paths, even in the paths that do not contain “d.h” (clang already does
> this). When we complete the Dispatch module definition it will be a
> definition that includes the extensions.
>

Hmm, I had thought that the #include -> module mapping would only look for
and load module map files from the path containing the #include'd file. It
seems we do not always perform this optimization today; with your proposed
change, such an optimization would no longer be correct, and that seems
unfortunate.

Also, we do perform this optimization in some cases, which are relevant to
your /usr/include versus /usr/local/include example. Suppose you build with
-Ifoo -Ibar, and you #include "baz/quux.h". If "foo/baz" does not exist but
"bar/baz" does, we will look for "bar/baz/module.modulemap" then for
"bar/module.modulemap". We will not look for "foo/module.modulemap" (nor
for "foo/baz/module.modulemap").

There are probably restrictions you could add that would help here. For
instance: if you have include paths X and Y, and your primary module map is
in X/path/, then your extension module map would need to be in Y/path/.
Even that isn't really sufficient (in particular if X or Y has module maps
in directories underneath /path/, or if the module maps can be found under
multiple include paths, you might still have problems), but it's a good
start.

If M1 and M2 are built with different search paths then the translation
> unit that imports both will rebuild M1 because it doesn’t depend on the
> expected Dispatch-<hashWithExtensions>.pcm.
>
> I think you should choose: either (a) if there's a private extension, then
>> you somehow guarantee that you only ever build / use the .pcm with that
>> extension and never mix that with a .pcm built without the extension, or
>> (b) treat the private extension as a layer on top of the public module.
>>
>>> If your Dispatch.Private is simply a layer on top of Dispatch, then
>>> building them as two separate .pcm files seems like the right choice; it
>>> keeps your Dispatch module (for want of a better word) modular. If on the
>>> other hand, you need includes/imports in Dispatch to pull in headers /
>>> submodules from Dispatch.Private, then one big Dispatch .pcm is probably
>>> the right answer, and we'd need something like your proposal so we could
>>> say "here is a Dispatch module that's like the one in /usr/include but
>>> different in the following ways”.
>>>
>>>
>>> (Whichever of these options we pick, we can make the "@import
>>> Dispatch.Private" syntax do the right thing.)
>>>
>>>
>>> In both cases we would need to extend module map parsing to allow
>>> submodule extensions. I think the original proposal accommodates that with
>>> the possible need that once clang supports separating a top module to
>>> different pcm files that we may need to control whether there is a combined
>>> module file or multiple ones.
>>> Given that separating a top module in multiple pcm files is a rather
>>> intrusive change and the combined pcm file is sufficient for our needs, I’d
>>> like to proceed for now with the extending module map functionality which
>>> will result in one top level .pcm file. Is this reasonable ?
>>>
>>
>> A few questions:
>>
>> What are you going to use as the defining module map file (part of the
>> 'key' used for determining .pcm identity)? Is it the defining module map
>> for the top-level module or the module map with the extension?
>>
>>
>> The key to determine identity will include both module map files.
>>
>> What happens if multiple module maps try to extend the same module?
>>
>>
>> I’d prefer to allow it if they extend with different submodules, and
>> error if they try to define the same submodule.
>>
>> Why do you need a separate module map file name? Why not just put your
>> extension into the normal module.modulemap file?
>>
>>
>> The reason was to disallow module definitions in the private module map
>> file:
>>
>
> I don't think you need that if your 'extern module' specifies a path. You
> just imported a definition of the module; it'd be an error to provide
> another one.
>
>
> Per my example, it protects against writing a module definition directly
> and missing any “extern” declaration (we could have “extern module” without
> a path also trigger an error if you define the extern'ed module in the same
> map file).
>
In general I like that a ‘module.private.modulemap' would immediately
> indicate that it contains only private extensions to existing modules,
> while seeing a ‘module.modulemap’ you’d expect to see new module
> definitions. These can then be present in the same directory
> ('/usr/local/include/module.private.modulemap’ would contain the extensions
> while '/usr/local/include/module.modulemap’ would contain new modules that
> only exist in /usr/local/include).
>
> But I don’t think these are very important benefits; do you feel that it
> is unnecessary complexity ?
>

If you're going to load both module map files anyway, you should get an
error on the module redefinition, so in a lot of cases you'd notice you got
it wrong pretty quickly. If this were a really common use case, the idea of
having another kind of module map file might be worth considering, but if
it's only really to allow /usr/local/include to partially override/extend
/usr/include modules, I think we should go for the simpler approach.

> - A module definition inside a private extension will be disallowed. The
>> rationale is that otherwise it will be a very common mistake for users
>> to write
>>
>> *module.modulemap:*
>> module Foo {
>>   <public headers>
>> }
>>
>> *module.private.modulemap:*
>> module Foo {
>>   <private headers>
>> }
>>
>> and then be left scratching their heads wondering why things are broken
>> (things missing, headers included textually, etc.). Being more strict in
>> private extension maps will be beneficial.
>>
>>
>>
>>
>> Also note that 'extern module' takes a string literal pointing to the
>> module map file defining the module, and it triggers us to recursively load
>> that module map file; your approach doesn't seem to take this into account.
>>
>>
>> I was aware of this functionality but if I think it is unnecessary
>> boilerplate and restricted flexibility if we force all the authors to write
>> module maps like:
>>
>> extern module Dispatch "../../../usr/include/dispatch/module.modulemap"
>> module Dispatch.Private { … }
>>
>> The compiler is going to either find the module definition or error about
>> it, so there’s not much benefit to be hardcoding paths; such hardcoding
>> will restrict flexibility and make testing more cumbersome.
>>
>> I’m not suggesting of course to remove the functionality, only to also
>> allow "extern module Dispatch”.
>>
>> I think when we see
>>
>>   extern module Foo "blah"
>>   module Foo.Bar { ... }
>>
>> ... we should parse "blah" before parsing the extension (this is the
>> natural result of the way the code is currently laid out) rather than
>> waiting until we hit "blah" on the search path and then adding an extension
>> to it.
>>
>>
>> Technically we don’t need to wait until we hit “blah". "extern module
>> Dispatch” can create an ‘incomplete’ Dispatch module, the submodules will
>> be added and once we parse the module definition, the module will be
>> ‘complete’ (and module lookup will succeed).
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20141113/e2455094/attachment.html>