[cfe-dev] RFC: Supporting private module maps for non-framework headers

Wed Nov 12 18:42:29 PST 2014

On Wed, Nov 12, 2014 at 2:53 PM, Argyrios Kyrtzidis <kyrtzidis at apple.com>
wrote:

>
> On Nov 12, 2014, at 2:34 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>
> On Tue, Nov 11, 2014 at 8:04 PM, Argyrios Kyrtzidis <kyrtzidis at apple.com>
> wrote:
>
>>
>> On Nov 11, 2014, at 7:38 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>>
>> On Tue, Nov 11, 2014 at 6:49 PM, Argyrios Kyrtzidis <kyrtzidis at apple.com>
>>  wrote:
>>
>>>
>>> On Nov 11, 2014, at 6:37 PM, Richard Smith <richard at metafoo.co.uk>
>>> wrote:
>>>
>>> On Tue, Nov 11, 2014 at 3:45 PM, Argyrios Kyrtzidis <kyrtzidis at apple.com
>>> > wrote:
>>>
>>>>
>>>> On Nov 11, 2014, at 12:34 PM, Richard Smith <richard at metafoo.co.uk>
>>>> wrote:
>>>>
>>>> On Tue, Nov 11, 2014 at 10:00 AM, Argyrios Kyrtzidis <
>>>> kyrtzidis at apple.com> wrote:
>>>>
>>>>>
>>>>> On Nov 10, 2014, at 7:48 PM, Richard Smith <richard at metafoo.co.uk>
>>>>> wrote:
>>>>>
>>>>> On Mon, Nov 10, 2014 at 4:00 PM, Argyrios Kyrtzidis <
>>>>> kyrtzidis at apple.com> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> For frameworks Clang currently supports adding a separate module map
>>>>>> file for the private headers of the framework. It looks specifically for
>>>>>> the presence of ‘module.private.modulemap’ inside the .framework and parses
>>>>>> both the public and the private module maps when it processes its module.
>>>>>> We would like to extend support for private module maps for non-framework
>>>>>> headers as well.
>>>>>>
>>>>>> In the Darwin platform, the public SDK headers are located in
>>>>>> '/usr/include', while the associated private SDK headers are located in
>>>>>> '/usr/local/include’. '/usr/local/include’ comes before '/usr/include’ in
>>>>>> the header search paths.
>>>>>>
>>>>>
>>>>> I worry that this will be fragile. If for any reason we look in
>>>>> /usr/include but not in /usr/local/include, we'll not load the private
>>>>> extension map and things will probably go quite badly from that point
>>>>> onwards. If the presence of the /usr/local/include headers is a fundamental
>>>>> part of a /usr/include module, then it seems better to me to specify that
>>>>> within the /usr/include module map.
>>>>>
>>>>> So here's one possibility: allow 'extern module' declarations to be
>>>>> nested within other modules, then write your /usr/include module map as:
>>>>>
>>>>> module MyModule {
>>>>>   <...>
>>>>>   extern module SomethingPrivate
>>>>> "/usr/local/include/module.private.map"
>>>>> }
>>>>>
>>>>>
>>>>> This has drawbacks:
>>>>>
>>>>> - Details of the private SDK, “leak out” to the public one. It should
>>>>> work similar to frameworks, in that the public SDK remains the same
>>>>> irrespective if there is or not a private API, and the private API is a
>>>>> straight addition on top of the public one without needing to modify
>>>>> something in the public SDK.
>>>>> - It is a bit weak as guarantee anyway because the public module map
>>>>> must necessarily function even when the extension map is missing, which
>>>>> means pointing at the wrong path or missing the private map when you really
>>>>> need it will not be detected.
>>>>> - Flexibility to extend a module from any path may be valuable for
>>>>> testing.
>>>>>
>>>>
>>>> OK, I'm not sure I understand what problem you're solving. If the
>>>> /usr/local/include stuff works as a layer on top of /usr/include, why do
>>>> you need them to be built as part of the same module? (Do your
>>>> /usr/local/include headers override / #include_next some of the
>>>> /usr/include headers, perhaps? If so, do you need the #includes in
>>>> /usr/include to find the /usr/local/include headers rather than the
>>>> /usr/include headers?)
>>>>
>>>>
>>>> There are some cases of cycles between public/private headers which
>>>> would be accommodated by a single module build but the primary motivation
>>>> is that we would like the module public/private interface to be under the
>>>> same namespace, so you’d do
>>>>
>>>> @import Dispatch;
>>>> @import Dispatch.Private;
>>>>
>>>> @import Darwin.POSIX.Foo.Bar;
>>>> @import Darwin.POSIX.Foo.Bar.Private;
>>>>
>>>> and generally any kind of private extension:
>>>>
>>>> @import Dispatch.SuperCoolButPrivate;
>>>>
>>>
>>> Do you want / need them to be built as a single module file, or not?
>>>
>>>
>>> As I said, cycles may make things difficult for separate module files,
>>> but how are we going to get new submodules under the same module name with
>>> separate module files ?
>>>
>>
>> Well, the restriction that module files correspond to top-level module
>> names is arbitrary and artificial. (It's also a bad idea for a few reasons.
>> It makes incremental refactoring very hard, for instance, because you're
>> required to have no cycles at any point between things in different
>> top-level modules.)
>>
>> Splitting up the description of how to build a module file across various
>> module maps seems like a very error-prone strategy, especially if you're
>> intending to be able to stop looking before you've read all of the relevant
>> module maps.
>>
>>
>> I think that the high level parts of my proposal are not dependent on
>> whether we build one .pcm file or multiple ones, this is an implementation
>> detail.
>> To be more specific, if we have
>>
>> *module.private.modulemap *(extension):
>> extern module Dispatch
>> module Dispatch.Private {
>>   <headers>
>> }
>>
>> *module.modulemap:*
>> module Dispatch {
>>   <headers>
>> }
>>
>> It is an implementation detail whether we buiild one Dispatch.pcm file or
>> a Dispatch.Private.pcm file that depends on another Dispatch.pcm; it should
>> make no difference on user code.
>> Is this incorrect ?
>>
>
> Whether we build one .pcm or multiple is observable in some circumstances.
> 1) We concatenate together all the header files built as part of one .pcm
> file, and parse them all at once, and that is not always semantically
> equivalent to building them in two separate passes. 2) If you have one big
> Dispatch.pcm file which also contains the private bits, and by any sequence
> of events you end up also pulling in another Dispatch.pcm that contains the
> public headers but not the private ones, you may get ambiguity errors. 3)
> We do not allow circular references across .pcm files but do allow them
> within a .pcm file.
>
>
> (2) is a bug.
>

Perhaps. Suppose I put this in a header:

extern struct {
  int a, b;
} s;

If that ends up in two different translation units, do I have an error? In
C++, the answer is "yes", because the types are different; in C, it's "no",
because the types are compatible. Extending this to the modules world, if I
have the above in two different modules, should they be mergeable?

With the "bottom-up modularization" approach that Apple has been taking so
far, I expect you'll find that merging a Dispatch.pcm and a
DispatchWithExtensions.pcm together won't work very well. (Ironically, I'd
have more confidence this would work if you were using C++ rather than C,
because the merging story is more developed and better tested there.) And
even if it does work, the possibility of having two different modules
providing the same interface as part of the same translation unit seems
like a bad idea.

I think you should choose: either (a) if there's a private extension, then
you somehow guarantee that you only ever build / use the .pcm with that
extension and never mix that with a .pcm built without the extension, or
(b) treat the private extension as a layer on top of the public module.

> If your Dispatch.Private is simply a layer on top of Dispatch, then
> building them as two separate .pcm files seems like the right choice; it
> keeps your Dispatch module (for want of a better word) modular. If on the
> other hand, you need includes/imports in Dispatch to pull in headers /
> submodules from Dispatch.Private, then one big Dispatch .pcm is probably
> the right answer, and we'd need something like your proposal so we could
> say "here is a Dispatch module that's like the one in /usr/include but
> different in the following ways”.
>
>
> (Whichever of these options we pick, we can make the "@import
> Dispatch.Private" syntax do the right thing.)
>
>
> In both cases we would need to extend module map parsing to allow
> submodule extensions. I think the original proposal accommodates that with
> the possible need that once clang supports separating a top module to
> different pcm files that we may need to control whether there is a combined
> module file or multiple ones.
> Given that separating a top module in multiple pcm files is a rather
> intrusive change and the combined pcm file is sufficient for our needs, I’d
> like to proceed for now with the extending module map functionality which
> will result in one top level .pcm file. Is this reasonable ?
>

A few questions:

What are you going to use as the defining module map file (part of the
'key' used for determining .pcm identity)? Is it the defining module map
for the top-level module or the module map with the extension?
What happens if multiple module maps try to extend the same module?
Why do you need a separate module map file name? Why not just put your
extension into the normal module.modulemap file?

Also note that 'extern module' takes a string literal pointing to the
module map file defining the module, and it triggers us to recursively load
that module map file; your approach doesn't seem to take this into account.
I think when we see

  extern module Foo "blah"
  module Foo.Bar { ... }

... we should parse "blah" before parsing the extension (this is the
natural result of the way the code is currently laid out) rather than
waiting until we hit "blah" on the search path and then adding an extension
to it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20141112/de6b6829/attachment.html>