[PATCH] Allow multiple modules with the same name to coexist in the module cache

Mon Mar 31 15:24:27 PDT 2014

On Mon, Mar 31, 2014 at 11:40 AM, Ben Langmuir <blangmuir at apple.com> wrote:

>
> On Mar 28, 2014, at 4:42 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>
> On Fri, Mar 28, 2014 at 3:27 PM, Ben Langmuir <blangmuir at apple.com> wrote:
>
>> Hi Richard,
>>
>>
>> On Mar 28, 2014, at 2:45 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>>
>> On Fri, Mar 28, 2014 at 1:12 PM, Ben Langmuir <blangmuir at apple.com>
>> wrote:
>>
>>> This patch allows multiple modules that have the same name to coexist in
>>> the module cache.  To differentiate between two modules with the same name,
>>> we will consider the path the module map file that they are defined by*
>>> part of the 'key' for looking up the precompiled module (pcm file).
>>>  Specifically, this patch renames the precompiled module (pcm) files from
>>>
>>> cache-path/<module hash>/Foo.pcm
>>>
>>> to
>>>
>>> cache-path/<module hash>/Foo-<hash of module map path>.pcm
>>>
>>
>> From a high level, I don't really see why we need a second hash here.
>> Shouldn't the -I options be included in the <module hash>? If I build the
>> same module with different -I flags, that should resolve to different .pcm
>> files, regardless of whether it makes the module name resolve to a
>> different module.map file.
>>
>>
>> If we include the -I options in the module hash, we will explode the
>> number of module compilations needed.  The following should all be able to
>> share a module 'A'.
>>
>
> I really don't think they should, if that second -I path is used in any
> way when building module 'A'.
>
>
> clang -fmodules -I /path/to/A -I /path/to/B some_file.c
>> clang -fmodules -I /path/to/A -I /path/to/C some_file2.c
>> clang -fmodules -I /path/to/A -I /path/to/D some_file3.c
>>
>
> I would think a better solution here would be to not have that second -I
> path in header search when building module A (and thus not include it in
> the hash for a principled reason).
>
> Ultimately, each module should have its own header search path -- the
> model of one set of include paths for the entire TU (with that include path
> potentially causing some libraries to find the wrong files) is a broken,
> antiquated legacy of the non-modules world.
>
>
> I agree with this in principle - having a predictable context for building
> the modules would solve a lot of problems.  However, that seems to only
> work for semantic import (@import, not #include), which is the less common
> case for us at the moment.
>

Can you explain why? Suppose every module.map file lists the include paths
for that module -- what problems does that create? (Is this problem
specific to us generating implicit module.map files for frameworks?)

In the short term, we should probably drop all header search paths that are
> before the path in which the module map was found.
>
>
> To be clear, this has the same correctness problem as my approach - the
> header search paths that come before the module map may still change the
> content of the module, especially if the module depends on other modules.
>

What I'm suggesting is:

1) Drop the -I paths that are earlier than the module in the header search
path when building the module
2) Include the rest of the header search paths in the configuration hash
for the module

I don't see how this still leads to collisions (other than the madness that
results from having multiple module.maps on your header search path that
define the same module, where the order in which we happen to load them
affects our semantics).

In general, adding the -I options is brittle when precompiled headers are
> involved. Since you likely don't have exactly the same -I options when
> building your PCH as when you use it (you likely have fewer -I paths when
> building the PCH if it is being widely used) you could potentially have
> different module hashes in the PCH and in the main file, which leads to
> trying to load multiple copies of a module from different paths.  I ran
> into this when I tried to naively add all of the -I options to the module
> hash.  The heuristic you suggested for dropping the earlier paths might
> help here, but it would be required for correctness, not merely as an
> optimization.
>

I'll need to think about this one some more. My initial reaction is that if
a PCH loads a module in one configuration and a user of the PCH tries to
load the same module in a different configuration (because it's being built
with different include paths), that should be a hard error. It seems to me
that this is a separate issue from ensuring that we don't have collisions
in the cache?

> I am also not sure what effect this would have on the global module index,
> since the set of modules to load would span multiple hash-directories and
> we would need to avoid looking at incompatible modules.  I'm not sure if
> this is a problem or not.
>

Yes, that's an interesting problem =)

> Any thoughts?
>

I don't think I've got enough information to suggest something concrete
yet, but this strikes me as a possible avenue to explore:

1) Teach module.map files to specify the include paths for the module
2) When implicitly generating a module.map file, inject the include paths
from the host build (probably skipping ones prior to the directory
containing the framework)
3) Include a hash of the contents of the implicit module.map file in the
cache entry for any module that has an implicit module.map
4) Don't include the header search paths in the configuration hash, since
they're not used when determining how to build a module any more.

> Are you trying to cope with the case where the -I path finds multiple
>> module.map files defining the same module (where it's basically chance
>> which one will get built and used)? I don't really feel like this is the
>> right solution to that problem either -- we should remove the 'luck' aspect
>> and use some sane mechanism to determine which module.map files are loaded,
>> and in what order.
>>
>>
>> Nope - that is not being addressed.
>>
>>
>> Is this addressing some other case?
>>
>> See above.
>>
>>
>>
>>> In addition, I've taught the ASTReader to re-resolve the names of
>>> imported modules during module loading so that if the header search context
>>> changes between when a module was originally built and when it is loaded we
>>> can rebuild it if necessary.  For example, if module A imports module B
>>>
>>> first time:
>>> clang -I /path/to/A -I /path/to/B ...
>>>
>>> second time:
>>> clang -I /path/to/A -I /different/path/to/B ...
>>>
>>> will now rebuild A as expected.
>>>
>>>
>>> * in the case of inferred modules, we use the module map file that
>>> *allowed* the inference, not the __inferred_module.map file, since the
>>> inferred file path is the same for every inferred module.
>>
>>
>>
>> Review comments on the patch itself:
>>
>>  +  /// the inferrence (e.g. contained 'module *') rather than the virtual
>>
>> Typo 'inference', 'Module *'.
>>
>> +  /// For an explanaition of \p ModuleMap, see Module::ModuleMap.
>>
>> Typo 'explanation'.
>>
>> +  // safe becuase the FileManager is shared between the compiler
>> instances.
>>
>> Typo 'because'
>>
>>
>> Thanks for catching the embarrassing quantity of typos :)
>>
>>
>> +  // the inferred module. If this->ModuleMap is nullptr, then we are
>> using
>> +  // -emit-module directly, and we cannot have an inferred module.
>>
>> I don't understand what this comment is trying to say. If we're using
>> -emit-module, then we were given a module map on the command-line; should
>> that not be referred to by this->ModuleMap? (Also, why 'this->'?) How can a
>> top-level module be inferred? Is that a framework-specific thing?
>>
>>
>> Hmm, I don't recall why I didn't just pass in the InputFile as the module
>> map.  I'll do that.
>>
>> Yes, AFAIK only framework modules can be inferred at the top-level.
>>
>>
>> +    StringRef ModuleMap = this->ModuleMap ? this->ModuleMap->getName() :
>> InFile;
>>
>> Please pick a different variable name rather than shadowing a member of
>> '*this' here.
>>
>>
>> Will do.
>>
>>
>> +    // Construct the name <ModuleName>-<hash of ModuleMapPath>.pcm which
>> should
>> +    // be globally unique to this particular module.
>> +    llvm::APInt Code(64, llvm::hash_value(ModuleMapPath));
>> +    SmallString<128> HashStr;
>> +    Code.toStringUnsigned(HashStr);
>>
>> Use base 36, like the module hash.
>>
>>
>> I could have sworn I did... must have got lost along the way.  Will do.
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20140331/e8461491/attachment.html>