[PATCH] Add stopgap option -fmodule-implementation-of

Mon Jul 28 20:01:55 PDT 2014

On Mon, Jul 28, 2014 at 6:25 PM, Ben Langmuir <blangmuir at apple.com> wrote:

>
> On Jul 28, 2014, at 5:09 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>
> On Mon, Jul 28, 2014 at 2:05 PM, Ben Langmuir <blangmuir at apple.com> wrote:
>
>>
>> On Jul 24, 2014, at 6:58 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>>
>> On Thu, Jul 24, 2014 at 7:56 AM, Ben Langmuir <blangmuir at apple.com>
>> wrote:
>>
>>>
>>> On Jul 16, 2014, at 3:42 PM, Richard Smith <richard at metafoo.co.uk>
>>> wrote:
>>>
>>> On Fri, Jul 11, 2014 at 8:42 AM, Ben Langmuir <blangmuir at apple.com>
>>> wrote:
>>>
>>>> Hey RIchard,
>>>>
>>>> Sorry to take so long to reply to this, but I am still interested in
>>>> getting this stopgap into tree.
>>>>
>>>
>>> Sorry about the delay getting back to you!
>>>
>>>
>>>> Please do not add a stopgap workaround to our stable and
>>>> backwards-compatible driver interface; just add it to -cc1 instead.
>>>>
>>>>
>>>>  Sure.
>>>>
>>>> I don't see any relation between the flag's name and its functionality;
>>>> there seems to be no reason for this to be linked to the translation unit
>>>> being the implementation of any particular module (and if there were,
>>>> that's what -fmodule-name is for). Instead, I think what you're trying to
>>>> specify is that a particular module is included textually for this
>>>> compilation. Please pick a name that suggests that functionality instead.
>>>>
>>>>
>>>> In the abstract I agree with this, but the use case I have is only for
>>>> TUs that are implementation files for a module and I know that is the only
>>>> time that this flag will be used by our tools.  It is more useful for the
>>>> diagnostic to say “don’t do this in the implementation of module Foo”,
>>>> since that matches when the build system will be passing in this flag.
>>>>  Given that this doesn’t go into the driver, is this still an issue? If
>>>> not, I can update and commit this patch, or can post it again for review if
>>>> you prefer :-)
>>>>
>>>
>>> I'm fine with this as a short-term cc1-only flag. Longer-term I think we
>>> need to evaluate whether we can make the import-of-same-module cases "just
>>> work" (I think we can), and I hope this becomes unnecessary at that point.
>>>
>>>
>>> r213767
>>>
>>>
>>> >>> What’s unexpected to me is that changing a header whose contents are
>>>> not usually visible may still require rebuilding all of my .cpp files.
>>>> >>> module Foo { module One { header “One.h” } module Two { header
>>>> “Two.h” } }
>>>> >>>
>>>> >>> // One.cpp - I don’t want to rebuild when Two.h changes
>>>> >>> #import <Foo/One.h>
>>>> >>>
>>>> >>> Do we agree that this is unnecessary if submodules cannot
>>>> accidentally be affected by changes in other submodules they don’t import
>>>> (and we have some way to get the set of dependency files for just the
>>>> submodule)?
>>>> >>
>>>> >>
>>>> >> No, I don't agree with that. One.cpp might inline some function
>>>> definitions from Two.h, for instance. Or it might fail to build because it
>>>> declares something that conflicts with something in Two.h.
>>>> >
>>>> >
>>>> > I feel like I”m missing something - how is that different from
>>>> One.cpp having conflicts with some completely different header or module
>>>> that is not imported into that particular TU?
>>>>
>>>> If you import any part of a module, you have the whole module as part
>>>> of your translation unit, even though only some of it might be visible.
>>>> Thus we will diagnose your declarations that conflict with unimported
>>>> portions of an imported module.
>>>>
>>>> Maybe we need to have this discussion on cfe-dev at some point.  I
>>>> think we need a driver flag to control whether clang reports headers from
>>>> unimported submodules as dependencies, which will allow users/build systems
>>>> to make the tradeoff.  As for the default, I strongly feel we shouldn't
>>>> penalize build performance for correct code in order to guarantee that
>>>> these particular ODR violations get diagnosed in incremental builds.  A
>>>> full rebuild will still see any diagnostics and the subset of errors that
>>>> this affects are not being diagnosed today with headers, so we’re still
>>>> improving.
>>>>
>>>
>>> Conversely, I think that we should provide a guarantee that incremental
>>> and full builds produce bit-for-bit identical results. As you say, it's a
>>> tradeoff, but note that this isn't just about ODR violation checking -- the
>>> incremental approach you're suggesting can generate wrong code in some
>>> cases (we can inline a function definition from the old version of Two.h)
>>> -- so if we want to support this partial-rebuild mode, we'll need to be
>>> /very/ careful that we don't pull in any information from an unimported
>>> submodule in that mode.
>>>
>>>
>>> Maybe you can help me understand how this would come about.  In our
>>> documentation we say:
>>>
>>> Modules are modeled as if each submodule were a separate translation
>>> unit, and a module import makes names from the other translation unit
>>> visible
>>>
>>>
>>> Here’s my understanding:
>>> If I don’t import the submodule containing “Two.h”, then I shouldn’t get
>>> its definitions in my TU.
>>>
>>
>> You get its definitions in your *program*. If you import any part of a
>> module, the entire module is part of your program. Example:
>>
>>
>> Okay, but that’s just more consistency checking, ins’t it?  If I import
>> Module1.B, but not Module1.A (or Module2.C) I don’t want to see “f” in my
>> exported symbols.
>>
>
> I think you're saying that it would in principle be possible for us to
> accept the example I gave? It probably would, but the fact that we reject
> it right now is a feature, not a bug.
>
>
> Agreed, although I think we weigh its benefit vs incremental building
> differently.
>
>  Module1.A:
>> int f(int);
>>
>> Module1.B:
>> extern int n;
>>
>> Module2.C:
>> import Module1.B;
>> void f(int); // error, conflicting return type
>>
>> If I have an inline declaration for a function in Two, then I still need
>>> to have a definition in my own TU because of inline.  If I have a
>>> non-inline decl, then Two can’t have an inline decl and if it has a
>>> definition for the function not marked inline then having that definition
>>> show up in my TU would lead to multiple definitions if Two is imported
>>> somewhere else.
>>>
>>
>> You can get into this situation with C++ templates. You might only be
>> able to see a declaration of a template, where another submodule provides a
>> definition that is hidden but still available for inlining. This doesn't
>> violate any language rule as long as there's an explicit instantiation of
>> the template somewhere.
>>
>>
>> If I don’t see a definition in my TU, how can I use the template in a way
>> affected by inlining?
>>
>
> You do "see" a definition in your TU, for some value of "see". That
> definition *is* imported, and is known about by the compiler; we just give
> you an error if you try to use it. CodeGen is still able to emit it. This
> is necessary to support entities that are imported by a module but not
> re-exported.
>
>
> Consider this:
>
> Module X:
>   inline int f() { return 0; }
> Module Y:
>   import X; // not re-exported
>   inline int g() { return f(); }
> Z.cc:
>   import Y;
>   int k = g();
>
> In Z.cc, we are *required* to emit the body of 'f', even though you can't
> "see" it.
>
>
> Okay, that makes sense.  This is certainly something we would need to
> account for to do safe incremental rebuilding.  I think the right answer is
> to make sure that the transitive imports get included in the reported
> dependencies regardless of being re-exported.
>
> And entities in X are treated just like entities in an unimported
> submodule of Y.
>
>
> Ah.  This seems like an accident of the implementation rather than a
> desirable property.  We have two distinct cases:
>
> 1) A imports B, and B is not re-exported.  B’s headers are still
> dependencies for our TU even though they aren’t  visible.
> 2) A has submodules B and C.  Importing A.B does not create a dependency
> on A.C or vice versa.
>

I think you mean "should not" rather than "does not" here: under the
current implementation, it certainly does, in that the contents of A.C can
affect whether a user of A.B builds today. Even then (as you note above) we
have a trade-off here; there are benefits to having that dependency.

I may not have an instantiation of a template, but I still need to see its
>> definition.  If its definition changes, that would require rebuilding the
>> other TU that has the instantiation.  I’m probably being thick, but I still
>> don’t see the issue here.
>>
>>
>> You can also get into this situation with the C99 inline rules, where you
>> don't have to define an 'inline' function in every translation unit.
>>
>>
>> Did this change in C11, or am I misreading this?
>> 6.7.4.7: For a function with external linkage, the following
>> restrictions apply: If a function is declared with an inline function
>> specifier, then it shall also be defined in the same translation unit.
>>
>
> That rule applies only if the function is declared with the 'inline'
> specifier in that translation unit. Example:
>
> Module X.A:
>   extern int f(void); // ok, no 'inline', no definition required in this TU
> Module X.B:
>   inline int f() { return 0; } // ok, definition
> main.cc:
>   import X.A;
>   int main() { return f(); }
>
> In this setup, f() might get inlined into main, even though the definition
> is not visible. (FWIW, I expect we'll also generate wrong code in this
> case, because we'll emit a strong definition of 'f' from every TU that
> imports X; conversely, if X.A and X.B are split into separate top-level
> modules, then a TU that imports both will not emit a strong definition of
> 'f’.)
>
>
> I don’t think this is a good idea at all.  I’m okay with saying that
> you’re not allowed to have conflicting submodules, but having them create
> implicit dependencies like this violates my mental model for semantic
> import.  I would much prefer that X.A and X.B behave the same as top-level
> modules (except that importing X might implicitly pull in A and/or B), and
> I think that would be much less surprising.
>

I used to think the same thing, but I don't any more. I think there is
value in being able to say that a collection of submodules together forms
some coherent, logically-indivisible whole (call it a "library", maybe?),
where the submodules just provide visibility control over the pieces of
that library. Right now, we also couple that to two other things: the
identity of the "library", and the .pcm file structure, are both determined
by the top-level module name. I'm not convinced that's a good idea -- there
are certainly cases where it makes sense to have more granularity than that.

If we could decouple this "same library" / "same .pcm file" decision from
the top-level module name, so that you could say "X.A and X.B are
notionally separate (and live in distinct libraries / .pcm files)", would
that address your concern?

Another point that seems relevant is that implicit module builds are a bad
idea in a lot of situations. They don't distribute well, they rely on
side-channels for sharing module files, they break existing build system
assumptions, they require multiple compile actions to block waiting for
each other, and so on. A better approach, which we should be encouraging
people to use, is to make the module build step explicit in the build
system. Once we treat "building a module" as a build step with its own
dependencies (which is in turn depended on by downstream .cpp and module
builds), this incremental rebuild approach becomes rather problematic.

Finally, a point I've raised before is that hermetic builds are important
to a lot of people: for build reproducibility, cacheability, and so on,
it's important that your build does *not* depend on the path of builds you
did previously.

Both these points would be addressed by splitting your X.A and X.B builds
up so they built separate .pcm files.

What happens when I provide an incompatible external definiton of “f()” in
> another TU?  We can’t diagnose the conflict
>

There is no conflict; the C standard says that the implementation gets to
pick whichever one it likes.

Eventually, I'd like for us to include some IR (representing inline
function definitions and so on) in the module file, to remove the cost of
repeatedly generating IR for inline functions within modules. I don't think
we want the complexity of segregating that IR on the basis of frontend name
visibility rules.

and we will be calling the inline definition from a module we didn’t import
> (from the user’s perspective).  Seems at least as bad as the other
> conflicts we’ve talked about :-)
>
> If you actually want the inlining, just make the inline definition
> visible, or turn on LTO.
>

Conversely, if you actually want separate entities from a dependency point
of view, just make different module files for them.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20140728/119d2edd/attachment.html>