[cfe-dev] RFC: Supporting private module maps for non-framework headers

Wed Nov 12 19:54:09 PST 2014

On Wed, Nov 12, 2014 at 7:24 PM, Argyrios Kyrtzidis <kyrtzidis at apple.com>
wrote:

>
> On Nov 12, 2014, at 6:42 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>
> On Wed, Nov 12, 2014 at 2:53 PM, Argyrios Kyrtzidis <kyrtzidis at apple.com>
> wrote:
>
>>
>> On Nov 12, 2014, at 2:34 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>>
>> On Tue, Nov 11, 2014 at 8:04 PM, Argyrios Kyrtzidis <kyrtzidis at apple.com>
>>  wrote:
>>
>>>
>>> On Nov 11, 2014, at 7:38 PM, Richard Smith <richard at metafoo.co.uk>
>>> wrote:
>>>
>>> On Tue, Nov 11, 2014 at 6:49 PM, Argyrios Kyrtzidis <kyrtzidis at apple.com
>>> > wrote:
>>>
>>>>
>>>> On Nov 11, 2014, at 6:37 PM, Richard Smith <richard at metafoo.co.uk>
>>>> wrote:
>>>>
>>>> On Tue, Nov 11, 2014 at 3:45 PM, Argyrios Kyrtzidis <
>>>> kyrtzidis at apple.com> wrote:
>>>>
>>>>>
>>>>> On Nov 11, 2014, at 12:34 PM, Richard Smith <richard at metafoo.co.uk>
>>>>> wrote:
>>>>>
>>>>> On Tue, Nov 11, 2014 at 10:00 AM, Argyrios Kyrtzidis <
>>>>> kyrtzidis at apple.com> wrote:
>>>>>
>>>>>>
>>>>>> On Nov 10, 2014, at 7:48 PM, Richard Smith <richard at metafoo.co.uk>
>>>>>> wrote:
>>>>>>
>>>>>> On Mon, Nov 10, 2014 at 4:00 PM, Argyrios Kyrtzidis <
>>>>>> kyrtzidis at apple.com> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> For frameworks Clang currently supports adding a separate module map
>>>>>>> file for the private headers of the framework. It looks specifically for
>>>>>>> the presence of ‘module.private.modulemap’ inside the .framework and parses
>>>>>>> both the public and the private module maps when it processes its module.
>>>>>>> We would like to extend support for private module maps for non-framework
>>>>>>> headers as well.
>>>>>>>
>>>>>>> In the Darwin platform, the public SDK headers are located in
>>>>>>> '/usr/include', while the associated private SDK headers are located in
>>>>>>> '/usr/local/include’. '/usr/local/include’ comes before '/usr/include’ in
>>>>>>> the header search paths.
>>>>>>>
>>>>>>
>>>>>> I worry that this will be fragile. If for any reason we look in
>>>>>> /usr/include but not in /usr/local/include, we'll not load the private
>>>>>> extension map and things will probably go quite badly from that point
>>>>>> onwards. If the presence of the /usr/local/include headers is a fundamental
>>>>>> part of a /usr/include module, then it seems better to me to specify that
>>>>>> within the /usr/include module map.
>>>>>>
>>>>>> So here's one possibility: allow 'extern module' declarations to be
>>>>>> nested within other modules, then write your /usr/include module map as:
>>>>>>
>>>>>> module MyModule {
>>>>>>   <...>
>>>>>>   extern module SomethingPrivate
>>>>>> "/usr/local/include/module.private.map"
>>>>>> }
>>>>>>
>>>>>>
>>>>>> This has drawbacks:
>>>>>>
>>>>>> - Details of the private SDK, “leak out” to the public one. It should
>>>>>> work similar to frameworks, in that the public SDK remains the same
>>>>>> irrespective if there is or not a private API, and the private API is a
>>>>>> straight addition on top of the public one without needing to modify
>>>>>> something in the public SDK.
>>>>>> - It is a bit weak as guarantee anyway because the public module map
>>>>>> must necessarily function even when the extension map is missing, which
>>>>>> means pointing at the wrong path or missing the private map when you really
>>>>>> need it will not be detected.
>>>>>> - Flexibility to extend a module from any path may be valuable for
>>>>>> testing.
>>>>>>
>>>>>
>>>>> OK, I'm not sure I understand what problem you're solving. If the
>>>>> /usr/local/include stuff works as a layer on top of /usr/include, why do
>>>>> you need them to be built as part of the same module? (Do your
>>>>> /usr/local/include headers override / #include_next some of the
>>>>> /usr/include headers, perhaps? If so, do you need the #includes in
>>>>> /usr/include to find the /usr/local/include headers rather than the
>>>>> /usr/include headers?)
>>>>>
>>>>>
>>>>> There are some cases of cycles between public/private headers which
>>>>> would be accommodated by a single module build but the primary motivation
>>>>> is that we would like the module public/private interface to be under the
>>>>> same namespace, so you’d do
>>>>>
>>>>> @import Dispatch;
>>>>> @import Dispatch.Private;
>>>>>
>>>>> @import Darwin.POSIX.Foo.Bar;
>>>>> @import Darwin.POSIX.Foo.Bar.Private;
>>>>>
>>>>> and generally any kind of private extension:
>>>>>
>>>>> @import Dispatch.SuperCoolButPrivate;
>>>>>
>>>>
>>>> Do you want / need them to be built as a single module file, or not?
>>>>
>>>>
>>>> As I said, cycles may make things difficult for separate module files,
>>>> but how are we going to get new submodules under the same module name with
>>>> separate module files ?
>>>>
>>>
>>> Well, the restriction that module files correspond to top-level module
>>> names is arbitrary and artificial. (It's also a bad idea for a few reasons.
>>> It makes incremental refactoring very hard, for instance, because you're
>>> required to have no cycles at any point between things in different
>>> top-level modules.)
>>>
>>> Splitting up the description of how to build a module file across
>>> various module maps seems like a very error-prone strategy, especially if
>>> you're intending to be able to stop looking before you've read all of the
>>> relevant module maps.
>>>
>>>
>>> I think that the high level parts of my proposal are not dependent on
>>> whether we build one .pcm file or multiple ones, this is an implementation
>>> detail.
>>> To be more specific, if we have
>>>
>>> *module.private.modulemap *(extension):
>>> extern module Dispatch
>>> module Dispatch.Private {
>>>   <headers>
>>> }
>>>
>>> *module.modulemap:*
>>> module Dispatch {
>>>   <headers>
>>> }
>>>
>>> It is an implementation detail whether we buiild one Dispatch.pcm file
>>> or a Dispatch.Private.pcm file that depends on another Dispatch.pcm; it
>>> should make no difference on user code.
>>> Is this incorrect ?
>>>
>>
>> Whether we build one .pcm or multiple is observable in some
>> circumstances. 1) We concatenate together all the header files built as
>> part of one .pcm file, and parse them all at once, and that is not always
>> semantically equivalent to building them in two separate passes. 2) If you
>> have one big Dispatch.pcm file which also contains the private bits, and by
>> any sequence of events you end up also pulling in another Dispatch.pcm that
>> contains the public headers but not the private ones, you may get ambiguity
>> errors. 3) We do not allow circular references across .pcm files but do
>> allow them within a .pcm file.
>>
>>
>> (2) is a bug.
>>
>
> Perhaps. Suppose I put this in a header:
>
> extern struct {
>   int a, b;
> } s;
>
> If that ends up in two different translation units, do I have an error? In
> C++, the answer is "yes", because the types are different; in C, it's "no",
> because the types are compatible. Extending this to the modules world, if I
> have the above in two different modules, should they be mergeable?
>
> With the "bottom-up modularization" approach that Apple has been taking so
> far, I expect you'll find that merging a Dispatch.pcm and a
> DispatchWithExtensions.pcm together won't work very well. (Ironically, I'd
> have more confidence this would work if you were using C++ rather than C,
> because the merging story is more developed and better tested there.) And
> even if it does work, the possibility of having two different modules
> providing the same interface as part of the same translation unit seems
> like a bad idea.
>
>
>  Dispatch.pcm and Dispatch[WithExtensions].pcm will never show up in the
> same translation unit, they will have different identities; similar to how
> module maps from different paths result in different pcm files for the same
> module, and they are not mixed together in the same translation unit.
>

Consider the following:

"d.h" is in Dispatch, and "dx.h" is in DispatchWithExtensions.
Module M1 says '#include "d.h"'. We implicitly load only the /usr/include
module map, find the Dispatch module, and build it, without ever loading
the extension.
Module M2 says '#include "dx.h"'. That results in us building the
DispatchWithExtensions module.
Now suppose a translation unit imports M1 and M2. We end up with both
.pcm's in the same TU.

How does your approach prevent the above from happening?

> I think you should choose: either (a) if there's a private extension, then
> you somehow guarantee that you only ever build / use the .pcm with that
> extension and never mix that with a .pcm built without the extension, or
> (b) treat the private extension as a layer on top of the public module.
>
>> If your Dispatch.Private is simply a layer on top of Dispatch, then
>> building them as two separate .pcm files seems like the right choice; it
>> keeps your Dispatch module (for want of a better word) modular. If on the
>> other hand, you need includes/imports in Dispatch to pull in headers /
>> submodules from Dispatch.Private, then one big Dispatch .pcm is probably
>> the right answer, and we'd need something like your proposal so we could
>> say "here is a Dispatch module that's like the one in /usr/include but
>> different in the following ways”.
>>
>>
>> (Whichever of these options we pick, we can make the "@import
>> Dispatch.Private" syntax do the right thing.)
>>
>>
>> In both cases we would need to extend module map parsing to allow
>> submodule extensions. I think the original proposal accommodates that with
>> the possible need that once clang supports separating a top module to
>> different pcm files that we may need to control whether there is a combined
>> module file or multiple ones.
>> Given that separating a top module in multiple pcm files is a rather
>> intrusive change and the combined pcm file is sufficient for our needs, I’d
>> like to proceed for now with the extending module map functionality which
>> will result in one top level .pcm file. Is this reasonable ?
>>
>
> A few questions:
>
> What are you going to use as the defining module map file (part of the
> 'key' used for determining .pcm identity)? Is it the defining module map
> for the top-level module or the module map with the extension?
>
>
> The key to determine identity will include both module map files.
>
> What happens if multiple module maps try to extend the same module?
>
>
> I’d prefer to allow it if they extend with different submodules, and error
> if they try to define the same submodule.
>
> Why do you need a separate module map file name? Why not just put your
> extension into the normal module.modulemap file?
>
>
> The reason was to disallow module definitions in the private module map
> file:
>

I don't think you need that if your 'extern module' specifies a path. You
just imported a definition of the module; it'd be an error to provide
another one.

- A module definition inside a private extension will be disallowed. The
> rationale is that otherwise it will be a very common mistake for users to
> write
>
> *module.modulemap:*
> module Foo {
>   <public headers>
> }
>
> *module.private.modulemap:*
> module Foo {
>   <private headers>
> }
>
> and then be left scratching their heads wondering why things are broken
> (things missing, headers included textually, etc.). Being more strict in
> private extension maps will be beneficial.
>
>
>
>
> Also note that 'extern module' takes a string literal pointing to the
> module map file defining the module, and it triggers us to recursively load
> that module map file; your approach doesn't seem to take this into account.
>
>
> I was aware of this functionality but if I think it is unnecessary
> boilerplate and restricted flexibility if we force all the authors to write
> module maps like:
>
> extern module Dispatch "../../../usr/include/dispatch/module.modulemap"
> module Dispatch.Private { … }
>
> The compiler is going to either find the module definition or error about
> it, so there’s not much benefit to be hardcoding paths; such hardcoding
> will restrict flexibility and make testing more cumbersome.
>
> I’m not suggesting of course to remove the functionality, only to also
> allow "extern module Dispatch”.
>
> I think when we see
>
>   extern module Foo "blah"
>   module Foo.Bar { ... }
>
> ... we should parse "blah" before parsing the extension (this is the
> natural result of the way the code is currently laid out) rather than
> waiting until we hit "blah" on the search path and then adding an extension
> to it.
>
>
> Technically we don’t need to wait until we hit “blah". "extern module
> Dispatch” can create an ‘incomplete’ Dispatch module, the submodules will
> be added and once we parse the module definition, the module will be
> ‘complete’ (and module lookup will succeed).
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20141112/21eecbf9/attachment.html>