[PATCH] D119409: [C++20] [Modules] Remain internal-linkage variables in module interface unit

Nathan Sidwell via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Feb 17 04:02:22 PST 2022


urnathan added inline comments.


================
Comment at: clang/test/CXX/modules-ts/basic/basic.def.odr/p4/module.cpp:5
 // CHECK-DAG: @extern_var_exported = external {{(dso_local )?}}global
-// CHECK-DAG: @inline_var_exported = linkonce_odr {{(dso_local )?}}global
+// CHECK-DAG: @inline_var_exported = available_externally {{(dso_local )?}}global
 // CHECK-DAG: @const_var_exported = available_externally {{(dso_local )?}}constant i32 3,
----------------
ChuanqiXu wrote:
> urnathan wrote:
> > ChuanqiXu wrote:
> > > dblaikie wrote:
> > > > ChuanqiXu wrote:
> > > > > ChuanqiXu wrote:
> > > > > > urnathan wrote:
> > > > > > > ChuanqiXu wrote:
> > > > > > > > urnathan wrote:
> > > > > > > > > ChuanqiXu wrote:
> > > > > > > > > > urnathan wrote:
> > > > > > > > > > > I don;t think this is correct.  That should still be a linkonce odr, otherwise you'll get conflicts with other module implementation units.
> > > > > > > > > > It is still linkonce_odr in the module it get defined. See the new added test case: inline-variable-in-module.cpp for example. The attribute `available_externally` is equivalent to external from the perspective of linker. See https://llvm.org/docs/LangRef.html#linkage-types. According to [dcl.inline]p7, inline variable attached to named module should be defined in that domain. Note that the variable attached to global module fragment and private module fragment shouldn't be accessed outside the module, so it implies that all the variable defined in the module could only be defined in the module unit itself.
> > > > > > > > > There's a couple of issues with this.  module.cppm is emitting a (linkonce) definition of inlne_var_exported, but only because it itself is ODR-using that variable.  If you take out the ODR-use in noninline_exported, there is no longer a symbol emitted.
> > > > > > > > > 
> > > > > > > > > But, even if you forced inline vars to be emitted in their defining-module's interface unit, that would be an ABI change.  inline vars are emitted whereever ODR-used.  They have no fixed home TU.  Now, we could alter the ABI and allow interface units to define a home location for inline vars and similar entities (eg, vtables for keyless classes).  But we'd need buy-in from other compilers to do that.
> > > > > > > > > 
> > > > > > > > > FWIW such a discussion did come up early in implementing modules-ts, but we decided there was enough going on just getting the TS implemented.  I'm fine with revisiting that, but it is a more significant change.
> > > > > > > > > 
> > > > > > > > > And it wouldn't apply to (eg) templated variables, which of course could be instantiated anywhere.
> > > > > > > > Oh, now the key point here is what the correct behavior is instead of the patch. Let's discuss it first.
> > > > > > > > 
> > > > > > > > According to [[ http://eel.is/c++draft/dcl.inline#7 | [dcl.inline]p7 ]], 
> > > > > > > > > If an inline function or variable that is attached to a named module is declared in a definition domain, it shall be defined in that domain.
> > > > > > > > 
> > > > > > > > I think the intention of the sentence is to define inline variable in the module interface. So if it is required by the standard, I think other compiler need to follow up. As I described in the summary, it might be a difference between C++20 module and ModuleTS. Do you think it is necessary to send the question to WG21? (I get the behavior from reading the words. Maybe I misread or the word is not intentional).
> > > > > > > > 
> > > > > > > > Maybe the ABI standard group need to discuss what the linkage should be. Now it may be weak_odr or linkonce_odr. It depends on how we compile the file. If we compile the .cppm file directly, it would be linkonce_odr. And if we compile it to *.pcm file first, it would be weak_odr. I have registered an issue for this: https://github.com/llvm/llvm-project/issues/53838.
> > > > > > > > Oh, now the key point here is what the correct behavior is instead of the patch. Let's discuss it first.
> > > > > > > > 
> > > > > > > > According to [[ http://eel.is/c++draft/dcl.inline#7 | [dcl.inline]p7 ]], 
> > > > > > > > > If an inline function or variable that is attached to a named module is declared in a definition domain, it shall be defined in that domain.
> > > > > > > > 
> > > > > > > > I think the intention of the sentence is to define inline variable in the module interface. So if it is required by the standard, I think other compiler need to follow up. As I described in the summary, it might be a difference between C++20 module and ModuleTS. Do you think it is necessary to send the question to WG21? (I get the behavior from reading the words. Maybe I misread or the word is not intentional).
> > > > > > > 
> > > > > > > You are reading more into the std than it says.  The std specifies what /source code/ is meaningful.  It says nothing about how a computation system might represent the program in another form.  Most of the latter, for ahead-of-time translation, is at the discretion of compiler implementors.  Part of that is the domain of the ABI, which specifies an interface to which different compilers may target, and then have compatibility at the object-file boundary. 
> > > > > > > 
> > > > > > > > Maybe the ABI standard group need to discuss what the linkage should be. 
> > > > > > > 
> > > > > > > Correct. And right now there is no consensus to do anything different with such entities.
> > > > > > > The ABI (http://itanium-cxx-abi.github.io/cxx-abi/abi.html) 5.2 documents such vague-linkage entities.  That section would need changes to bless what you are trying to do.
> > > > > > > 
> > > > > > > 
> > > > > > > You are reading more into the std than it says. The std specifies what /source code/ is meaningful. It says nothing about how a computation system might represent the program in another form. Most of the latter, for ahead-of-time translation, is at the discretion of compiler implementors. Part of that is the domain of the ABI, which specifies an interface to which different compilers may target, and then have compatibility at the object-file boundary.
> > > > > > 
> > > > > > OK, your words make sense. In fact, I don't care much about whether or not could we define `inline variable` in the module unit. The problem I tried to solve is about `the definition static variable in module`. We couldn't run a simple hello world example if we don't solve it.
> > > > > > 
> > > > > > What I care about is where should we define inline function. I want to define inline function in the module unit it get declared. And my theory comes from [dcl.inline]p7. And our experiments show that it is the key reason why module could speed up compilation. Our data shows that the compilation could speed up about 40% for the feature. Since most of the time consumed in compilation spent on the middle end, it is really not significant to save the time in frontend. So it matters a lot if we could avoid compiling same functions in middle end.
> > > > > > 
> > > > > > Originally, I thought I am doing right. But from your words, we couldn't do this until the ABI standard group get in consensus, right?
> > > > > > 
> > > > > > Finally, I feel it is odd about [dcl.inline]p7. Since if it is not for implementors, I feel it is meaningless for users.
> > > > > Or given that the CXXABI doesn't discuss the case about inline function in named module. Could we think it is a free space? Another question maybe where could we ask for opinion? WG21 or Itanium CXXABI group?
> > > > > Finally, I feel it is odd about [dcl.inline]p7. Since if it is not for implementors, I feel it is meaningless for users.
> > > > 
> > > > Presumably that means that a user can't declare an inline function in a module, and define it somewhere else (like in a private implementation unit) - they must define it in the same definition domain it is declared. That's a concrete requirement for the user, irrespective of what object-file-level implementation strategy (where the definition gets emitted, what linkage is used, etc) is used.
> > > > Presumably that means that a user can't declare an inline function in a module, and define it somewhere else (like in a private implementation unit) - they must define it in the same definition domain it is declared. That's a concrete requirement for the user, irrespective of what object-file-level implementation strategy (where the definition gets emitted, what linkage is used, etc) is used.
> > > 
> > > Oh, I get it. Thanks.
> > yes, I understand the problem you are trying to solve  (had the same in GCC).  The issue is with internal-linkage entities in global module fragments.  Let's consider 3 separate cases.
> > a) the GMF is a header-unit.
> > 
> > 1)  We could either emit it in a header-unit-specific object file (if ODR-used when building that header unit).  This would surprise users as now we have a thing that is morally a header-file, but comes with an object file.  That is likely to break build flow and is not what GCC does.  There is of course a trade off here, in that it's either emitted exactly once, or emitted into every TU that ODR uses it.  But is that a significant  extra burden?  The only variable case I came across was _ioinit.  (There are many static inline functions, due to C compatibility, but for those you want to inline them anyway.)
> > 
> > 2) Or we could emit it in every TU that directly or indirectly imports the header unit and ODR uses the entity.  This is what GCC does.  in the case of _ioinit that means making sure to call its dynamic constructor from the TU's initialization function.
> > 
> > 3) Note we do not clone the internal entity within a single TU, once for each header or module that we import that itself ODR uses the entity.
> > 
> > b) textual inclusion in the GMF of a module.  If the module ODR-uses the entity, it needs to be emitted into the object file.
> > 
> > c) Both textual inclusion AND importing of a header-unit. We could emit 2 copies, or we could merge these two definitions.  Merging is better (and what GCC does).
> > 
> > The std allows all the alternatives considered above.  Does that help?
> Thanks for the explanation. It really helps. Since I didn't touch header-unit before (I mainly focus on named module), I don't have an opinion for (a) and (c). For the (b), it might not be the case. I think we need to emit it all the time. Here is the pattern of <iostream> in libstdc++:
> ```
> extern istream cin;
> // ...
> static ios_base::Init __ioinit;
> ``` 
> 
> And after including, `__ioinit` is not used in the module unit, but we couldn't erase it. Otherwise the problem might met segfault. You could find the example in the issue I linked.
> 
> ---
> 
> In fact, I care more about whether or not should we emit `inline` function body in module purview to the module unit. But given that this is not the intention of the patch, let's talk it in other places.
> Thanks for the explanation. It really helps. Since I didn't touch header-unit before (I mainly focus on named module), I don't have an opinion for (a) and (c). For the (b), it might not be the case. I think we need to emit it all the time. Here is the pattern of <iostream> in libstdc++:
> ```
> extern istream cin;
> // ...
> static ios_base::Init __ioinit;
> ``` 
> 
> And after including, `__ioinit` is not used in the module unit, but we couldn't erase it. Otherwise the problem might met segfault. You could find the example in the issue I linked.

Hehe, I remember having to look this up. __ioinit /is/ used, because it has a dynamic initializer: [basic.start.dynamic]

there is no need for the compiler to emit unused internal-linkage variables with static initialization.


> In fact, I care more about whether or not should we emit `inline` function body in module purview to the module unit. But given that this is not the intention of the patch, let's talk it in other places.

Again, that's ABI.  While we could emit externally-visible out-of-line-bodies into the module's object file (and have importers of the module not emit such out-of-line bodies as needed), that would require the same ABI change as the variable case.  I notice that the ABI http://itanium-cxx-abi.github.io/cxx-abi/abi.html#vague speficially coveres inline function bodies (5.2.1), but not inline variables, because it was never updated when those became a feature.  But inline variables are just like keyless vtables, which it does cover, and that's what implementations do.)




CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119409/new/

https://reviews.llvm.org/D119409



More information about the cfe-commits mailing list