[cfe-dev] modular codegen of class template static member variables

David Blaikie via cfe-dev cfe-dev at lists.llvm.org
Wed Nov 29 14:34:33 PST 2017

On Tue, Nov 21, 2017 at 5:06 PM Richard Smith <richard at metafoo.co.uk> wrote:

> On 20 November 2017 at 15:04, David Blaikie via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>> Hi Richard,
>> (Lang, you're here because I mentioned stumbling across this on Friday in
>> ORC - this is the reduced test case (where 't' is the NameMutex member and
>> 'nt' is the Name member))
>> Working on getting all LLVM binaries linking successfully under modular
>> codegen, I've hit something that seems it'll need a bit more feature work
>> (which I'm happy/planning to do myself - though always happy for
>> help/advice/etc)...
>> The test case I have boils down to the following modular header:
>>   struct trivial {};
>>   struct nontrivial { nontrivial(); };
>>   // namespace foo {
>>   void sink(void *);
>>   template <typename T> struct bar {
>>     static void baz() {
>>       sink(&t);
>>       sink(&nt);
>>     }
>>     static trivial t;
>>     static nontrivial nt;
>>   };
>>   template <typename T> trivial bar<T>::t;
>>   template <typename T> nontrivial bar<T>::nt;
>>   //} // namespace foo
>>   template struct bar<int>;
>>   // inline void use() { (void)bar<int>::baz(); }
>> To build with modular codegen:
>>   $ echo 'module foo { header "foo.h" }' > foo.cppmap
>>   $ clang++ -cc1 -xc++ -emit-module -fmodules -w -fmodule-name=foo
>> foo.cppmap -o foo.pcm -fmodules-codegen
>> So here are some interesting facts I know, some of which may be relevant,
>> some of which may not:
>>    1.  Code as written ends up with linkonce_odr definitions for t and nt
>>    2. Use use instead of the explicit instantiation and are both t and nt are
>>    only declarations
>>    3. Add the outer namespace foo and then t is emitted as a linkonce_odr
>>    definition and nt is emitted as a declaration
>> That last one (which was the first result I got) really confuses me - any
>> ideas why a namespace would change the behavior here?
> My first guess would be that something has registered a
> ASTConsumer::HandleTopLevelDecl callback or similar, and they're assuming
> that it gets called for every namespace-scope declaration.

Do you reckon that's worth looking into that? Or just some unimportant

> In any case, all those mysteries/differences in behavior might be aside to
>> actually fixing the behavior here, which is what this email is really about.
>> This is basically the same problem as inline variables, and maybe even
>> would allow some support for static variables in headers too (not sure,
>> will see).
>> Any ideas what the behavior should be here? Since there's a desire not to
>> run all global initializers if their specific submodule header isn't
>> included in the program (for iostream's sake), how would this be done
>> correctly under modular codegen?
>> My initial thought is potentially to defer the global initializers to the
>> includers (that seems necessary to get the lazy/only-those-included
>> behavior, right?) But that may not account for indirect inclusion? I guess
>> that's already handled somehow for the iostreams non-modular case, so maybe
>> it works.
> The explicit instantiation definition case is not especially interesting,
> because by [temp.spec]/5.1, such things should never appear in modular
> headers (because the header could only ever be used in one translation
> unit).

Ah, good point.

> So let's focus in on the "inline void use()" case. We need some kind of
> mental model for what modular codegen means in order to figure out what
> should happen. The way I'm thinking about modular codegen is roughly:
> For each header for which we perform modular codegen, we act as if
>  * that header is a separate translation unit in the program (in
> *addition* to being included into other places), and
>  * for that translation unit, we happen to emit definitions of inline
> functions and class metadata, even if they're not otherwise used, and
>  * in other translation units, we don't need to emit those symbols as a
> consequence.

One minor difference here is that the whole module is a translation unit,
when it comes to the build details (which are somewhat relevant) - or at
least a single object file. Which means used/unused behavior of linkers
work at the module granularity, not  the header. (eg: if you use a function
from one header module object, you'll pull in all the functions from the
module, not just the one header (and all its initializers... ))

> Under that model, emitting the definition of use() should cause us to emit
> linkonce_odr definitions of both t and nt into the modular codegen
> translation unit. But it should not suppress the emission of linkonce_odr
> definitions of t and nt in other translation units too.
> However, we also want to *not* run initializers for modules that are not
> actually used (eg, we don't want linking against the standard library to
> run the iostreams initializer -- and thus link in the iostreams library --
> if it's not used, such as for a freestanding / embedded compilation). For
> modular codegen, this presumably also needs function sections, and section
> GC enabled in the linker.

Hmm, not sure I followed this last bit. Which part /needs/ functions
sections & GC sections?

> & then the modular object file would perhaps have the weak_odr definition
>> of the global variable itself, but no global initializer - depending on any
>> live codepaths that reference the global necessarily requiring the using TU
>> to have caused the initializer to run? That seems vaguely concerning...
> It does. Mostly I think it works out: if another TU is relying on an
> inline function definition to be provided by a modular codegen object, they
> must have run the notional initializer for that module, which in turn would
> have initialized those globals.
> There's one case I'm concerned by: suppose the module is never actually
> imported, and all the TUs actually include the header textually. Now,
> suppose the inline function and global variables from the modular codegen
> TU are selected at link time, and the other copies are all discarded, and
> we cleverly put the per-variable global initializers in a COMDAT with the
> variables, so they get discarded too. Now we're left with a reference to an
> uninitialized global.
> Perhaps we need to make a distinction between internal linkage globals
> with dynamic initialization in headers (eg the iostreams initializer),
> which we run in every user of the modular codegen header, and external
> linkage globals with dynamic initialization in headers (eg, inline
> variables, static data members of class templates, ...) which we run as
> part of initializing the modular codegen translation unit itself. If we do
> make that distinction, I worry that we'll lose some of the initialization
> order guarantees, though.

Right, so we (Richard & I) talked about this over lunch & summarizing our
(mostly Richard's) thoughts on this to the mailing list for posterity:

The initialization order guarantees are that, if I recall correctly - a
"happens after" thing. If there's inline variable A and B, B happens after
A at least. Now some other inline variable X might happen before B but
after A (if A and X appear in some other translation unit) but importantly
B can't happen before A.

General goal/premise/proposed solution:
Treat modular codegen objects the same as a translation unit that includes
the headers, /except/ for the internal linkage globals (eg: iostreams init).

This preserves ordering - it's just as correct (except for the internal
linkage globals) as if you had a separate source file that included all the
modular headers & weren't using modules to compile anything.

The only place it breaks down is the case where your modular codegen
non-internal globals (class template static members, variable templates,
inline variables, etc) use or depend on the internal linkage global
initializer (eg: an inline variable with an initializer that prints to a
stream, etc). We're just going to accept this as broken, I think...

Is that right, Richard? I feel like I left this too long and didn't
remember/include quite all the nuance about ordering, etc. Feel free to add

Practically speaking, currently all the globals aren't quite handled in
modular codegen because a lot of this work is triggered specifically by the
presence/handling of a module import. So I'll need to refactor and reuse
that code, that should make all the globals be handled - then specifically
opt-out/skip over the internal linkage variables to get back to the right
behavior there.

- Dave

> Is this making sense? Any good ideas? Pointers to where to start, etc?
>> Thanks,
>> - Dave
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20171129/d88f6661/attachment.html>

More information about the cfe-dev mailing list