[PATCH] Allow multiple modules with the same name to coexist in the module cache

Tue Apr 1 17:13:36 PDT 2014

On Tue, Apr 1, 2014 at 4:55 PM, Argyrios Kyrtzidis <kyrtzidis at apple.com>wrote:

>
> On Apr 1, 2014, at 3:32 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>
> On Tue, Apr 1, 2014 at 12:59 PM, Argyrios Kyrtzidis <kyrtzidis at apple.com>wrote:
>
>>
>> On Apr 1, 2014, at 12:03 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>>
>> On Mon, Mar 31, 2014 at 8:35 PM, Argyrios Kyrtzidis <kyrtzidis at apple.com>wrote:
>>
>>>
>>> On Mar 31, 2014, at 6:00 PM, Richard Smith <richard at metafoo.co.uk>
>>> wrote:
>>>
>>> On Mon, Mar 31, 2014 at 5:25 PM, Argyrios Kyrtzidis <kyrtzidis at apple.com
>>> > wrote:
>>>
>>>>
>>>> On Mar 31, 2014, at 4:44 PM, Richard Smith <richard at metafoo.co.uk>
>>>> wrote:
>>>>
>>>> On Mon, Mar 31, 2014 at 4:23 PM, Argyrios Kyrtzidis <
>>>> kyrtzidis at apple.com> wrote:
>>>>
>>>>> >
>>>>> > What I'm suggesting is:
>>>>> >
>>>>> > 1) Drop the -I paths that are earlier than the module in the header
>>>>> search path when building the module
>>>>> > 2) Include the rest of the header search paths in the configuration
>>>>> hash for the module
>>>>>
>>>>> This will prevent us from sharing system modules across projects, and,
>>>>> as Ben already mentioned, will result in an explosion of module files, even
>>>>> within the same project.
>>>>>
>>>>
>>>> Why? System modules would only have system include paths, and these
>>>> would usually be the same across projects, right?
>>>>
>>>>
>>>> I see what you mean, but we also need to support the case where the
>>>> system framework developer can point to the framework in his local
>>>> directory to be used, instead of the system one, how do you suggest we
>>>> allow this ?
>>>>
>>>
>>> I'd like for a module to be able to specify its own include paths. If a
>>> module does so, the include paths of the host build should not be included
>>> in its hash.
>>>
>>>
>>> That didn't quite address my question, let me be more specific. We
>>> generally need to allow, as an example, system AppKit to import a
>>> user-local Foundation; neither system AppKit specifying its own include
>>> paths, nor looking up module dependencies of AppKit only in system include
>>> paths, allow this.
>>>
>>
>> That's a curious use case.
>>
>> It also seems to be compatible with the approach I suggested above, I
>> think? You need to put your header search paths in the right order (so the
>> lower layers of your system are below the higher layers), but that's
>> generally desirable regardless. (We have a limitation that -isystem paths
>> are considered after -I paths, which might get in the way here, but that's
>> artificial and in conflict with what you're trying to do.)
>>
>>
>> What is the right order ?
>>
>> If the order is:
>>
>> -I /local/path -isystem /System/Frameworks
>>
>> and we drop -I paths earlier than the module then module import from
>> AppKit doesn't consider -I /local/path.
>>
>> If the order is:
>>
>> -isystem /System/Frameworks -I /local/path
>>
>> then module import from AppKit finds Foundation in /System/Frameworks and
>> doesn't consider -I /local/path.
>>
>>
>> Alternatively, this seems like a good use case motivating putting the
>> include path into the build configuration. The approach in this patch is a
>> bad solution for this use case, because the AppKit modules for different
>> Foundations will collide in the cache; any of the other solutions we've
>> discussed seem to handle this better.
>>
>>
>> If we don't drop -I paths earlier than the module then putting the '-I'
>> paths in the build configuration will cause module explosion.
>> If we drop -I paths earlier than the module then it doesn't address the
>> above case and the worse thing is that you don't anymore have one module
>> hash cache for the whole translation unit anymore, which creates
>> complications (e.g. Ben already mentioned the global module index).
>>
>> We could maybe do something in-between, for example introduce an option
>> for search paths specific for module lookups. Normally you wouldn't use
>> them but if you did use them then they would go in the build configuration
>> and would be considered first for module lookups. Then we could always do
>> the "drop -I paths earlier than the module".
>>
>>
>>   Even if we include the search paths in the module hash we will still
>>>>> need to re-lookup the module dependencies before loading the module,
>>>>> because a new module.map may have showed up somewhere in the search paths
>>>>> since the time we built the module; unless I'm missing something, I don't
>>>>> see any benefit in including the header search paths in the hash.
>>>>
>>>>
>>>> The hash should include everything that affects the way the module is
>>>> built. If we're transferring include paths from the user of the module to
>>>> the module build, the hash should include those paths, or two modules with
>>>> different search paths could collide in the cache.
>>>>
>>>>
>>>> The collision is avoided by using the path to the module.map as Ben
>>>> proposes; if different search paths resolve to different module
>>>> dependencies, this is a case where we need to rebuild, but the same is true
>>>> if the same paths are used but a different module.map shows up as
>>>> dependency.
>>>>
>>>
>>> I don't see how that's sufficient -- the include paths can also affect
>>> how non-modular #includes within the module behave.
>>>
>>> As a somewhat-contrived example, suppose I have a library
>>>
>>>   blah/
>>>     module.map
>>>     foo.h
>>>     mode1/
>>>       foo-impl.h
>>>     mode2/
>>>       foo-impl.h
>>>
>>> ... where foo-impl.h is textually-included into foo.h, and this is
>>> supposed to be used in two modes: one with a -I path pointing to mode1, and
>>> one with a -I path pointing to mode2. We should ensure those two modules
>>> don't collide in the cache. (Maybe mode1 and mode2 provide inline assembly
>>> for different CPU variants, or such.)
>>>
>>>
>>> Non-modular includes are problematic beyond just how to lookup the
>>> headers during module building. The major problem they introduce is that
>>> those same non-modular headers can be textually #included in the
>>> translation unit, before or after the module import that contains the same
>>> header contents, causing multiple definitions or other semantic issues.
>>> That's why we tend to prefer the clear semantic model that modules depend
>>> only in the headers that the module is comprised of, or other modules, and
>>> we want to encourage people to modularize their headers.
>>>
>>
>> First, many non-modular includes are *intended* to be repeatedly
>> textually included, and your argument does not apply to those cases
>> (consider Clang's .def files, for instance). Some headers are
>> implementation details of other headers and are only meant to be included
>> once, into one place; those are similarly unaffected.
>>
>>
>> We already have the syntax in the module map for these; (I think) the
>> syntax is "exclude <header>". This is the "escape hatch" to be explicit
>> about what non-modular, not-part-of-the-module, headers the module will
>> include, and subsequently it is your responsibility to make sure that
>> <header> is not textually included in the translation unit as well, or if
>> it is that it will not cause issues.
>>
>>
>> Second, the problem you describe is mostly an artifact of our broken
>> implementation of macro and declaration visibility across submodules of the
>> same top-level module. Names in earlier-built submodule are visible in
>> *all* later-built submodules, even those that don't import the earlier ones
>> (and @imports in earlier submodules naming later ones have no effect). Once
>> we fix that, the requirement to modularize bottom-up (and to "fix" the
>> guarded typedefs in your system headers) mostly goes away -- it's generally
>> fine to have the same entity declared in two different places, and we'll
>> merge together those declarations if they're both visible.
>>
>>
>> Are you saying that if I have:
>>
>> struct Foo {
>> <stuff>
>> };
>>
>> @import Blah
>>
>> and module 'Blah' has a definition for 'struct Foo' we will merge them
>> together ?
>>
>
> It depends on the language.
>
> In C, the two definitions for 'Foo' are compatible types if they have
> matching bodies, so they can be used interchangeably in that case and not
> otherwise. We don't yet support this, because our compatible type code
> doesn't know about modules, and this doesn't come up in non-modules code.
>
> In C++, the two definitions for 'Foo' are required to be equivalent under
> the ODR, and we will merge them (and to a limited degree, diagnose them if
> they differ). This already basically works.
>
>
> The "limited degree" makes me nervous here. We essentially would need to
> be able to take every definition/declaration in a C/C++/ObjC transition
> unit and merge them with definitions/declarations from modules, and make
> sure that we diagnose and error out on differences otherwise we may cause
> weird/confusing correctness issues.
>

The "limited degree" is that we only perform checks on those declarations
that we actually deserialize. We can perform more checking, at the cost of
being a bit more eager about deserialization, and I plan to add a -f flag
for this behavior.

This seems like a rather significant maintenance burden with lots of edge
> cases to worry about, and not as important if you have fully modular
> headers.
>

Merging (and ODR checking) is required in C++ whether or not you've got
fully modularized headers, because the same template instantiation can end
up in multiple modules, and could be different in different places (due to
differing declarations present prior to the instantiation). Yes, it's a
little messy, but it's actually not so bad, and not much worse than merging
redeclaration chains when multiple independent modules choose to declare
(but not define) the same entity.

 I think we need to fix this submodule visibility problem regardless (the
>> current state of play is extremely fragile to changes in the order we
>> happen to build the submodules), so I'm not convinced that this is a
>> long-term problem. (Nonetheless, I agree that having modularized
>> dependencies is generally a good thing.)
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20140401/c123cb91/attachment.html>