[PATCH] Allow multiple modules with the same name to coexist in the module cache

Tue Apr 1 16:55:45 PDT 2014

On Apr 1, 2014, at 3:32 PM, Richard Smith <richard at metafoo.co.uk> wrote:

> On Tue, Apr 1, 2014 at 12:59 PM, Argyrios Kyrtzidis <kyrtzidis at apple.com> wrote:
> 
> On Apr 1, 2014, at 12:03 PM, Richard Smith <richard at metafoo.co.uk> wrote:
> 
>> On Mon, Mar 31, 2014 at 8:35 PM, Argyrios Kyrtzidis <kyrtzidis at apple.com> wrote:
>> 
>> On Mar 31, 2014, at 6:00 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>> 
>>> On Mon, Mar 31, 2014 at 5:25 PM, Argyrios Kyrtzidis <kyrtzidis at apple.com> wrote:
>>> 
>>> On Mar 31, 2014, at 4:44 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>>> 
>>>> On Mon, Mar 31, 2014 at 4:23 PM, Argyrios Kyrtzidis <kyrtzidis at apple.com> wrote:
>>>> >
>>>> > What I'm suggesting is:
>>>> >
>>>> > 1) Drop the -I paths that are earlier than the module in the header search path when building the module
>>>> > 2) Include the rest of the header search paths in the configuration hash for the module
>>>> 
>>>> This will prevent us from sharing system modules across projects, and, as Ben already mentioned, will result in an explosion of module files, even within the same project.
>>>> 
>>>> Why? System modules would only have system include paths, and these would usually be the same across projects, right?
>>> 
>>> I see what you mean, but we also need to support the case where the system framework developer can point to the framework in his local directory to be used, instead of the system one, how do you suggest we allow this ?
>>> 
>>> I'd like for a module to be able to specify its own include paths. If a module does so, the include paths of the host build should not be included in its hash.
>> 
>> That didn't quite address my question, let me be more specific. We generally need to allow, as an example, system AppKit to import a user-local Foundation; neither system AppKit specifying its own include paths, nor looking up module dependencies of AppKit only in system include paths, allow this.
>> 
>> That's a curious use case.
>> 
>> It also seems to be compatible with the approach I suggested above, I think? You need to put your header search paths in the right order (so the lower layers of your system are below the higher layers), but that's generally desirable regardless. (We have a limitation that -isystem paths are considered after -I paths, which might get in the way here, but that's artificial and in conflict with what you're trying to do.)
> 
> What is the right order ?
> 
> If the order is:
> 
> 	-I /local/path -isystem /System/Frameworks
> 
> and we drop -I paths earlier than the module then module import from AppKit doesn’t consider -I /local/path.
> 
> If the order is:
> 
> 	-isystem /System/Frameworks -I /local/path
> 
> then module import from AppKit finds Foundation in /System/Frameworks and doesn’t consider -I /local/path.
> 
>> 
>> Alternatively, this seems like a good use case motivating putting the include path into the build configuration. The approach in this patch is a bad solution for this use case, because the AppKit modules for different Foundations will collide in the cache; any of the other solutions we've discussed seem to handle this better.
> 
> If we don’t drop -I paths earlier than the module then putting the ‘-I’ paths in the build configuration will cause module explosion.
> If we drop -I paths earlier than the module then it doesn’t address the above case and the worse thing is that you don’t anymore have one module hash cache for the whole translation unit anymore, which creates complications (e.g. Ben already mentioned the global module index).
> 
> We could maybe do something in-between, for example introduce an option for search paths specific for module lookups. Normally you wouldn’t use them but if you did use them then they would go in the build configuration and would be considered first for module lookups. Then we could always do the "drop -I paths earlier than the module”.
> 
>> 
>>>> Even if we include the search paths in the module hash we will still need to re-lookup the module dependencies before loading the module, because a new module.map may have showed up somewhere in the search paths since the time we built the module; unless I’m missing something, I don’t see any benefit in including the header search paths in the hash.
>>>> 
>>>> The hash should include everything that affects the way the module is built. If we're transferring include paths from the user of the module to the module build, the hash should include those paths, or two modules with different search paths could collide in the cache.
>>> 
>>> The collision is avoided by using the path to the module.map as Ben proposes; if different search paths resolve to different module dependencies, this is a case where we need to rebuild, but the same is true if the same paths are used but a different module.map shows up as dependency.
>>> 
>>> I don't see how that's sufficient -- the include paths can also affect how non-modular #includes within the module behave.
>>> 
>>> As a somewhat-contrived example, suppose I have a library
>>> 
>>>   blah/
>>>     module.map
>>>     foo.h
>>>     mode1/
>>>       foo-impl.h
>>>     mode2/
>>>       foo-impl.h
>>> 
>>> ... where foo-impl.h is textually-included into foo.h, and this is supposed to be used in two modes: one with a -I path pointing to mode1, and one with a -I path pointing to mode2. We should ensure those two modules don't collide in the cache. (Maybe mode1 and mode2 provide inline assembly for different CPU variants, or such.)
>> 
>> 
>> Non-modular includes are problematic beyond just how to lookup the headers during module building. The major problem they introduce is that those same non-modular headers can be textually #included in the translation unit, before or after the module import that contains the same header contents, causing multiple definitions or other semantic issues. That's why we tend to prefer the clear semantic model that modules depend only in the headers that the module is comprised of, or other modules, and we want to encourage people to modularize their headers.
>> 
>> First, many non-modular includes are *intended* to be repeatedly textually included, and your argument does not apply to those cases (consider Clang's .def files, for instance). Some headers are implementation details of other headers and are only meant to be included once, into one place; those are similarly unaffected.
> 
> We already have the syntax in the module map for these; (I think) the syntax is “exclude <header>”. This is the “escape hatch” to be explicit about what non-modular, not-part-of-the-module, headers the module will include, and subsequently it is your responsibility to make sure that <header> is not textually included in the translation unit as well, or if it is that it will not cause issues.
> 
>> 
>> Second, the problem you describe is mostly an artifact of our broken implementation of macro and declaration visibility across submodules of the same top-level module. Names in earlier-built submodule are visible in *all* later-built submodules, even those that don't import the earlier ones (and @imports in earlier submodules naming later ones have no effect). Once we fix that, the requirement to modularize bottom-up (and to "fix" the guarded typedefs in your system headers) mostly goes away -- it's generally fine to have the same entity declared in two different places, and we'll merge together those declarations if they're both visible.
> 
> Are you saying that if I have:
> 
> struct Foo {
> <stuff>
> };
> 
> @import Blah
> 
> and module ‘Blah’ has a definition for ‘struct Foo’ we will merge them together ?
> 
> It depends on the language.
> 
> In C, the two definitions for 'Foo' are compatible types if they have matching bodies, so they can be used interchangeably in that case and not otherwise. We don't yet support this, because our compatible type code doesn't know about modules, and this doesn't come up in non-modules code.
> 
> In C++, the two definitions for 'Foo' are required to be equivalent under the ODR, and we will merge them (and to a limited degree, diagnose them if they differ). This already basically works.

The "limited degree” makes me nervous here. We essentially would need to be able to take every definition/declaration in a C/C++/ObjC transition unit and merge them with definitions/declarations from modules, and make sure that we diagnose and error out on differences otherwise we may cause weird/confusing correctness issues.
This seems like a rather significant maintenance burden with lots of edge cases to worry about, and not as important if you have fully modular headers.

> 
>> I think we need to fix this submodule visibility problem regardless (the current state of play is extremely fragile to changes in the order we happen to build the submodules), so I'm not convinced that this is a long-term problem. (Nonetheless, I agree that having modularized dependencies is generally a good thing.)
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20140401/7138039c/attachment.html>