[PATCH] Add stopgap option -fmodule-implementation-of

Wed Jul 30 07:01:04 PDT 2014

> On Jul 28, 2014, at 11:56 PM, Richard Smith <richard at metafoo.co.uk> wrote:
> 
> On Mon, Jul 28, 2014 at 8:01 PM, Richard Smith <richard at metafoo.co.uk> wrote:
> On Mon, Jul 28, 2014 at 6:25 PM, Ben Langmuir <blangmuir at apple.com> wrote:
> 
>> On Jul 28, 2014, at 5:09 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>> 
>> On Mon, Jul 28, 2014 at 2:05 PM, Ben Langmuir <blangmuir at apple.com> wrote:
>> 
>>> On Jul 24, 2014, at 6:58 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>>> 
>>> On Thu, Jul 24, 2014 at 7:56 AM, Ben Langmuir <blangmuir at apple.com> wrote:
>>> 
>>>> On Jul 16, 2014, at 3:42 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>>>> 
>>>> On Fri, Jul 11, 2014 at 8:42 AM, Ben Langmuir <blangmuir at apple.com> wrote:
>>>> Hey RIchard,
>>>> 
>>>> Sorry to take so long to reply to this, but I am still interested in getting this stopgap into tree.
>>>> 
>>>> Sorry about the delay getting back to you!
>>>>  
>>>>> Please do not add a stopgap workaround to our stable and backwards-compatible driver interface; just add it to -cc1 instead.
>>>> 
>>>> 
>>>> Sure.
>>>> 
>>>>> I don't see any relation between the flag's name and its functionality; there seems to be no reason for this to be linked to the translation unit being the implementation of any particular module (and if there were, that's what -fmodule-name is for). Instead, I think what you're trying to specify is that a particular module is included textually for this compilation. Please pick a name that suggests that functionality instead.
>>>> 
>>>> 
>>>> In the abstract I agree with this, but the use case I have is only for TUs that are implementation files for a module and I know that is the only time that this flag will be used by our tools.  It is more useful for the diagnostic to say “don’t do this in the implementation of module Foo”, since that matches when the build system will be passing in this flag.  Given that this doesn’t go into the driver, is this still an issue? If not, I can update and commit this patch, or can post it again for review if you prefer :-)
>>>> 
>>>> I'm fine with this as a short-term cc1-only flag. Longer-term I think we need to evaluate whether we can make the import-of-same-module cases "just work" (I think we can), and I hope this becomes unnecessary at that point.
>>> 
>>> r213767
>>> 
>>>> 
>>>>> >>> What’s unexpected to me is that changing a header whose contents are not usually visible may still require rebuilding all of my .cpp files.
>>>>> >>> module Foo { module One { header “One.h” } module Two { header “Two.h” } }
>>>>> >>>
>>>>> >>> // One.cpp - I don’t want to rebuild when Two.h changes
>>>>> >>> #import <Foo/One.h>
>>>>> >>>
>>>>> >>> Do we agree that this is unnecessary if submodules cannot accidentally be affected by changes in other submodules they don’t import (and we have some way to get the set of dependency files for just the submodule)?
>>>>> >>
>>>>> >>
>>>>> >> No, I don't agree with that. One.cpp might inline some function definitions from Two.h, for instance. Or it might fail to build because it declares something that conflicts with something in Two.h.
>>>>> >
>>>>> >
>>>>> > I feel like I”m missing something - how is that different from One.cpp having conflicts with some completely different header or module that is not imported into that particular TU?
>>>>> 
>>>>> If you import any part of a module, you have the whole module as part of your translation unit, even though only some of it might be visible. Thus we will diagnose your declarations that conflict with unimported portions of an imported module.
>>>>> 
>>>> 
>>>> Maybe we need to have this discussion on cfe-dev at some point.  I think we need a driver flag to control whether clang reports headers from unimported submodules as dependencies, which will allow users/build systems to make the tradeoff.  As for the default, I strongly feel we shouldn't penalize build performance for correct code in order to guarantee that these particular ODR violations get diagnosed in incremental builds.  A full rebuild will still see any diagnostics and the subset of errors that this affects are not being diagnosed today with headers, so we’re still improving.
>>>> 
>>>> Conversely, I think that we should provide a guarantee that incremental and full builds produce bit-for-bit identical results. As you say, it's a tradeoff, but note that this isn't just about ODR violation checking -- the incremental approach you're suggesting can generate wrong code in some cases (we can inline a function definition from the old version of Two.h) -- so if we want to support this partial-rebuild mode, we'll need to be /very/ careful that we don't pull in any information from an unimported submodule in that mode.
>>> 
>>> Maybe you can help me understand how this would come about.  In our documentation we say:
>>> 
>>>> Modules are modeled as if each submodule were a separate translation unit, and a module import makes names from the other translation unit visible
>>> 
>>> 
>>> Here’s my understanding:
>>> If I don’t import the submodule containing “Two.h”, then I shouldn’t get its definitions in my TU.
>>> 
>>> You get its definitions in your *program*. If you import any part of a module, the entire module is part of your program. Example:
>> 
>> Okay, but that’s just more consistency checking, ins’t it?  If I import Module1.B, but not Module1.A (or Module2.C) I don’t want to see “f” in my exported symbols.
>> 
>> I think you're saying that it would in principle be possible for us to accept the example I gave? It probably would, but the fact that we reject it right now is a feature, not a bug.
> 
> Agreed, although I think we weigh its benefit vs incremental building differently.
> 
>>> Module1.A:
>>> int f(int);
>>> 
>>> Module1.B:
>>> extern int n;
>>> 
>>> Module2.C:
>>> import Module1.B;
>>> void f(int); // error, conflicting return type
>>> 
>>> If I have an inline declaration for a function in Two, then I still need to have a definition in my own TU because of inline.  If I have a non-inline decl, then Two can’t have an inline decl and if it has a definition for the function not marked inline then having that definition show up in my TU would lead to multiple definitions if Two is imported somewhere else.
>>> 
>>> You can get into this situation with C++ templates. You might only be able to see a declaration of a template, where another submodule provides a definition that is hidden but still available for inlining. This doesn't violate any language rule as long as there's an explicit instantiation of the template somewhere.
>> 
>> If I don’t see a definition in my TU, how can I use the template in a way affected by inlining?
>> 
>> You do "see" a definition in your TU, for some value of "see". That definition *is* imported, and is known about by the compiler; we just give you an error if you try to use it. CodeGen is still able to emit it. This is necessary to support entities that are imported by a module but not re-exported.
>> 
>> Consider this:
>> 
>> Module X:
>>   inline int f() { return 0; }
>> Module Y:
>>   import X; // not re-exported
>>   inline int g() { return f(); }
>> Z.cc:
>>   import Y;
>>   int k = g();
>> 
>> In Z.cc, we are *required* to emit the body of 'f', even though you can't "see" it.
> 
> Okay, that makes sense.  This is certainly something we would need to account for to do safe incremental rebuilding.  I think the right answer is to make sure that the transitive imports get included in the reported dependencies regardless of being re-exported.
> 
>> And entities in X are treated just like entities in an unimported submodule of Y.
> 
> Ah.  This seems like an accident of the implementation rather than a desirable property.  We have two distinct cases:
> 
> 1) A imports B, and B is not re-exported.  B’s headers are still dependencies for our TU even though they aren’t  visible.
> 2) A has submodules B and C.  Importing A.B does not create a dependency on A.C or vice versa.
> 
> I think you mean "should not" rather than "does not" here: under the current implementation, it certainly does, in that the contents of A.C can affect whether a user of A.B builds today.

Right.

> Even then (as you note above) we have a trade-off here; there are benefits to having that dependency.
> 
>> I may not have an instantiation of a template, but I still need to see its definition.  If its definition changes, that would require rebuilding the other TU that has the instantiation.  I’m probably being thick, but I still don’t see the issue here.
>> 
>> 
>>> You can also get into this situation with the C99 inline rules, where you don't have to define an 'inline' function in every translation unit.
>> 
>> Did this change in C11, or am I misreading this?
>> 6.7.4.7: For a function with external linkage, the following restrictions apply: If a function is declared with an inline function specifier, then it shall also be defined in the same translation unit.
>> 
>> That rule applies only if the function is declared with the 'inline' specifier in that translation unit. Example:
>> 
>> Module X.A:
>>   extern int f(void); // ok, no 'inline', no definition required in this TU
>> Module X.B:
>>   inline int f() { return 0; } // ok, definition
>> main.cc:
>>   import X.A;
>>   int main() { return f(); }
>> 
>> In this setup, f() might get inlined into main, even though the definition is not visible. (FWIW, I expect we'll also generate wrong code in this case, because we'll emit a strong definition of 'f' from every TU that imports X; conversely, if X.A and X.B are split into separate top-level modules, then a TU that imports both will not emit a strong definition of 'f’.)
> 
> I don’t think this is a good idea at all.  I’m okay with saying that you’re not allowed to have conflicting submodules, but having them create implicit dependencies like this violates my mental model for semantic import.  I would much prefer that X.A and X.B behave the same as top-level modules (except that importing X might implicitly pull in A and/or B), and I think that would be much less surprising.
> 
> I used to think the same thing, but I don't any more. I think there is value in being able to say that a collection of submodules together forms some coherent, logically-indivisible whole (call it a "library", maybe?), where the submodules just provide visibility control over the pieces of that library. Right now, we also couple that to two other things: the identity of the "library", and the .pcm file structure, are both determined by the top-level module name. I'm not convinced that's a good idea -- there are certainly cases where it makes sense to have more granularity than that.
> 
> If we could decouple this "same library" / "same .pcm file" decision from the top-level module name, so that you could say "X.A and X.B are notionally separate (and live in distinct libraries / .pcm files)", would that address your concern?
> 
> I asked something more specific than what I really wanted to know here. In Clang's current implementation, the top-level module that contains a given module affects a lot of things. In your X.A / X.B example, which properties do you want? Off the top of my head:
> 
>  1. X.A and X.B are placed into the same .pcm file
>   1a. That .pcm file doesn't contain any other top-level module
>  2. X.A and X.B are both part of any TU / program that uses either of them
>  3. X.A and X.B have names starting with the same prefix
>  4. X.A and X.B are notionally in the same "layer", so there's no need to think about dependency cycles with other modules

I’m not sure what (4) means.

>  5. X.A and X.B are always built together
> 

In my mind, a submodule should have a special relationship with its parent and its children, but not with its siblings.  Importing a module may imply importing its children, but importing X.A does not imply importing X.B (unless X.A transitively had such an import statement in it).

So (2) would specifically not be desired and (1) and (5) would be implementation details as long as we are producing modules on-demand.  Now to support explicitly generating pcms,  I think allowing submodules to be built in separate pcm files would be useful.

That being said, I think saying that the contents of all available submodules of a common ancestor should be compatible (no diagnostic required) would be fine, if that helps us diagnose more problems.

> (FWIW, I don't think it makes sense for all these things to be tied to the choice of top-level module name.)
> 
> Another point that seems relevant is that implicit module builds are a bad idea in a lot of situations. They don't distribute well, they rely on side-channels for sharing module files, they break existing build system assumptions, they require multiple compile actions to block waiting for each other, and so on. A better approach, which we should be encouraging people to use, is to make the module build step explicit in the build system. Once we treat "building a module" as a build step with its own dependencies (which is in turn depended on by downstream .cpp and module builds), this incremental rebuild approach becomes rather problematic.
> 
> Finally, a point I've raised before is that hermetic builds are important to a lot of people: for build reproducibility, cacheability, and so on, it's important that your build does *not* depend on the path of builds you did previously.
> 
> Both these points would be addressed by splitting your X.A and X.B builds up so they built separate .pcm files.
> 
> What happens when I provide an incompatible external definiton of “f()” in another TU?  We can’t diagnose the conflict
> 
> There is no conflict; the C standard says that the implementation gets to pick whichever one it likes.
> 
> Eventually, I'd like for us to include some IR (representing inline function definitions and so on) in the module file, to remove the cost of repeatedly generating IR for inline functions within modules. I don't think we want the complexity of segregating that IR on the basis of frontend name visibility rules.
> 
> and we will be calling the inline definition from a module we didn’t import (from the user’s perspective).  Seems at least as bad as the other conflicts we’ve talked about :-)
> 
> If you actually want the inlining, just make the inline definition visible, or turn on LTO.
> 
> Conversely, if you actually want separate entities from a dependency point of view, just make different module files for them.
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20140730/5cba5257/attachment.html>