[PATCH] D74361: [Clang] Undef attribute for global variables
John McCall via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Mon Mar 2 14:56:37 PST 2020
rjmccall added inline comments.
================
Comment at: clang/test/CodeGenCXX/attr-loader-uninitialized.cpp:23
+// CHECK: @nominally_value_init = global i32 undef
+int nominally_value_init [[clang::loader_uninitialized]] = 4;
+
----------------
JonChesterfield wrote:
> rjmccall wrote:
> > Quuxplusone wrote:
> > > rjmccall wrote:
> > > > JonChesterfield wrote:
> > > > > Quuxplusone wrote:
> > > > > > This test case is identical to line 36 of clang/test/Sema/attr-loader-uninitialized.cpp, where you say you don't want it to compile at all.
> > > > > >
> > > > > > I think you need a clearer idea of how this interacts with initializers. Is it merely supposed to eliminate the //zero-initialization// that happens before the user-specified construction/initialization, or is it supposed to compete with the user-specified construction/initialization?
> > > > > >
> > > > > > That is, for
> > > > > >
> > > > > > nontrivial unt [[clang::loader_uninitialized]];
> > > > > >
> > > > > > is it merely supposed to call `unt::unt()` on a chunk of undef memory (instead of the usual chunk of zeroed memory), or is it supposed to skip the constructor entirely? And for
> > > > > >
> > > > > > int x [[clang::loader_uninitialized]] = foo();
> > > > > >
> > > > > > is it merely supposed to call `foo()` and assign the result to a chunk of undef memory (instead of the usual chunk of zeroed memory), or is it supposed to skip the initialization entirely?
> > > > > I think you commented while the first working piece of sema landed. My thinking is relatively clear but my understanding of clang's semantic analysis is a work in progress!
> > > > >
> > > > > Initializers (`= foo()`) are straightforward. Error on the basis that the attribute effectively means `= undef`, and one should not have two initializers. A test case is now added for that (and now passes).
> > > > >
> > > > > The codegen I want for a default constructed global marked with this variable is:
> > > > > - global variable allocated, with undef as the original value
> > > > > - default constructor call synthesized
> > > > > - said default constructor set up for invocation from crt, before main, writing over the undef value
> > > > >
> > > > > Where the default constructor can be optimized as usual, e.g. if it always writes a constant, we can init with that constant instead of the undef and elide the constructor.
> > > > >
> > > > > I don't have that actually working yet - the constructor call is not being emitted, so we just have the undef global.
> > > > >
> > > > > I think it's important to distinguish between the values of the bits when the program is loaded and whether constructor/destructors are called, as one could want any combination of the two.
> > > > I think Arthur is suggesting that it would be useful to allow the attribute to be used in conjunction with an initializer in C++, since if the initializer has to be run dynamically, we can still meaningfully suppress the static zero-initialization. That is, we've accepted that it's useful to do this when *default-initializing* a global, but it's actually useful when doing *any* kind of dynamic initialization.
> > > >
> > > > Maybe we can leave it as an error in C++ when the initializer is a constant expression. Although that might be unnecessarily brittle if e.g. the constructor is `constexpr` in one library version but not another.
> > > No, that's exctly what I mean. You seem to be holding two contradictory ideas simultaneously:
> > >
> > > [[loader_uninitialized]] X x = X{}; // two initializers, therefore error
> > >
> > > [[loader_uninitialized]] X x {}; // one initializer plus one constructor, therefore fine
> > >
> > > In C++, these two declarations have identical semantics. It doesn't make sense to say that one of them "calls a constructor" and the other one "has an initializer." They're literally the same thing.
> > >
> > > Similarly in both C99 and C++ with plain old ints:
> > >
> > > [[loader_uninitialized]] int x = foo();
> > >
> > > This means "call foo and dynamically initialize x with the result," just as surely as
> > >
> > > [[loader_uninitialized]] X x = X();
> > >
> > > means "call X::X and dynamically initialize x with the result." Having one rule for dynamic initializers of primitive types and a separate rule for dynamic initializers of class types doesn't work.
> > >
> > > Furthermore, "dynamic initialization" can be promoted to compile-time:
> > >
> > > [[loader_uninitialized]] int x = 42;
> > > [[loader_uninitialized]] std::string_view x = "hello world";
> > >
> > > It wouldn't make semantic sense to say that one of these has "two initializers" and the other has "one initializer," because both of the initializations end up happening at compile time and getting put into .data.
> > > Similarly in both C99 and C++ with plain old ints:
> >
> > C does not allow non-constant initialization of global variables.
> >
> > Let me try to explain what we're thinking here. If the variable provides both an initializer and this attribute, and the compiler can successfully perform constant initialization (as it's required to do in C, and as it often does in C++), then the attribute has probably become meaningless because we're not going to suppress that constant initialization. The compiler strongly prefers to diagnose meaningless attributes in some way, because they often indirectly indicate a bug. For example, maybe the initializer isn't supposed to be there but it's hidden behind a huge mess of macros; or maybe somebody added a constant initializer because they changed the variable schema to make that possible, and so the programmer should now remove the attribute and the dynamic initialization code that went along with it. Compared to that, the C++ use case of "allow a dynamic initializer but suppress static zero-initialization" is pretty marginal; that doesn't mean we absolutely shouldn't support it, but it does suggest that we shouldn't prioritize it over other aspects of the feature, like catching that possible bug.
> I'm under the impression that the constructs are:
> ```
> X x = X{}; // default construct an X and then call the copy constructor at x
> X x {}; // default construct an X at the location of x
> X x; // default construct an X at the location of x
> ```
>
> The C++ initialisation rules are very complicated so I won't be shocked to have got that wrong.
>
> If your toolchain actually runs global constructors then there isn't much use to marking C++ objects with this attribute, unless said constructor does nothing and would thus leave the bytes still undef.
>
> Accepting default constructors (or, perhaps only accepting trivial types?) would allow people to use this attribute with the approximation of C++ available on systems that don't actually run the global constructors.
>
> Equally, if this seems too complicated, I'm happy to restrict this to C as that's still an improvement on the asm and/or linker scripts I have available today.
> If your toolchain actually runs global constructors then there isn't much use to marking C++ objects with this attribute, unless said constructor does nothing and would thus leave the bytes still undef.
That's not totally true. C++ still requires globals to be zero-initialized before it runs dynamic initializers, and the dynamic initializer is very likely to make that zero-initialization totally redundant. Arthur is right that there's no inherent reason that that only applies to dynamic *default* initialization, though, as opposed to any non-constant initialization.
I'd prefer to avoid making this C-specific; Clang takes a lot of pride in trying to make feature cross-language as much as possible. We could just make the error about providing both the attribute and an initializer C-specific, though.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D74361/new/
https://reviews.llvm.org/D74361
More information about the cfe-commits
mailing list