[cfe-dev] [RFC] automatic variable initialization

Wed Apr 10 11:13:29 PDT 2019

On Fri, Jan 18, 2019 at 10:44 AM Kostya Serebryany via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

>
>
> On Wed, Jan 16, 2019 at 7:58 PM Richard Smith <richard at metafoo.co.uk>
> wrote:
>
>> On Wed, 16 Jan 2019, 19:35 Kostya Serebryany via cfe-dev <
>> cfe-dev at lists.llvm.org wrote:
>>
>>>
>>>>
>>>> Should main be involved, though?
>>>>
>>>
>>> How else?
>>>
>>
>> By putting the initialisation where it belongs, in the constructor.
>>
>
> As already discussed below, initializing in CTORs will also initialize
> heap memory,
> i.e. will have different performance characteristics.
> Still a good idea.
>

Hi folks,

I've been investigating the approach of moving the initialization into the
constructors that we've been discussing on this thread, and I've hit a
problem that would prevent us from arranging for the constructor prologue
to always pattern initialize. The problem is related to objects with static
storage duration (i.e. globals), and it has to do with the following clause
from C++17 [basic.start.static]p2:

"If constant initialization is not performed, a variable with static
storage duration (6.7.1) or thread storage duration (6.7.2) is
zero-initialized (11.6). Together, zero-initialization and constant
initialization are called static initialization; all other initialization
is dynamic initialization. All static initialization strongly happens
before (4.7.1) any dynamic initialization."

What this essentially means is that an object with static storage duration
is initialized twice: once with zeros (i.e. static initialization) and once
with the constructor (i.e. dynamic initialization). This means that the
code in the constructor is allowed to rely on the object having first been
zero initialized, provided that the object has static storage duration.
Real-world code, such as Chromium's SpinLock class:
https://cs.chromium.org/chromium/src/third_party/tcmalloc/chromium/src/base/spinlock.h?l=52
relies on this guarantee.

What I think this means is that, in order to avoid creating a dialect of
C++ in which globals are not first zero initialized, we would need to teach
the compiler to emit a separate set of constructors that perform pattern
initialization and use the non-initializing constructors to initialize
globals. Since we need to emit two sets of constructors anyway, we may as
well also use the non-initializing constructors to initialize heap objects
in order to reduce overhead. I've prototyped this and found that, despite
the fact that we would be emitting more constructor variants into object
files, there is still a code size improvement because the extra constructor
copies can typically be eliminated by the linker, either via --gc-sections
(for objects that are only constructed on the stack or only on the heap or
in globals) or ICF (in the case where the constructor fully initializes its
object). In Chromium for Android, I've observed a binary size decrease of
400KB/0.5% for 64-bit ARM, and 290KB/0.7% for 32-bit ARM, versus pattern
initialization with the current approach. (That's a bit of a lie, though.
In both the before and after compilers, I changed the compiler to emit a
call to memset in emitStoresForPatternInit instead of calling
emitStoresForConstant, as I've done here:
https://github.com/pcc/llvm-project/tree/tvi-memset . I've found that in
Chromium as well as in the Linux kernel, this also results in substantial
binary size savings. But I'll cover that topic separately.)

Since this change involves new ABI, it would need to be opted into with a
command line flag. Note that a library compiled with pattern initializing
constructors may be used by code compiled without pattern initializing
constructors, so it isn't a total ABI break. So if libc++ were compiled
with this flag, for example, it could still be used by applications
compiled without it. If the user does not opt into pattern initializing
constructors, the behaviour would be as it is now: we would need to rely as
much as we can on dead store elimination and other optimizations.

It's also worth noting that this problem is exclusive to pattern
initialization. For zero initialization we would only need one set of
constructors which always zero initialize because, upon entry, a
constructor may assume that either the allocated memory is uninitialized
(in the non-global case) or zero (in the global case), making the zero
initialization a no-op in the global case.

To produce the pattern initializing constructors, as well as the pattern
initialization for stack variables, we perform initialization as usual,
calling into the pattern constructors where possible. The frontend keeps
track of the calls to pattern constructors; each call will punch a hole in
the initialization that the frontend will later insert in the prologue. In
my prototype, this works quite well.

It has also been suggested that, as an alternative to punching holes in the
frontend, we could make dead store elimination smart enough that it can
punch holes itself. But given what this would actually entail, I do not
consider this to be the correct approach due to the complexities involved.
Consider the following code:

A *the_a;

struct A {
  A() : foo(1) { // assume that all ctors are actually out-of-line
    the_a = this;
  }
  size_t foo;
};

struct B {
  B() : bar(2) {}
  size_t bar;
};

struct C : A, B {
  C() : baz(3) {}
  size_t baz;
};

Our goal is to eliminate the memset in the following code:

C c;
memset(&c, 0xaa, sizeof(C));
c.C();

The problem that the optimizer has is that it needs to prove that none of
the constructors read from c before it is fully initialized. Note that a
pointer to c escapes from the A constructor, so the written attribute that
JF proposed upthread is not sufficient for expressing that the
initialization is never read by the B constructor, since at the IR level
there's nothing stopping the B constructor from, for example, reading a
pointer from the_a, static_casting it to type C and reading the pattern
from baz. Such a static_cast results in undefined behaviour according to
the C++ standard, but only until the object is fully initialized. So we
could conceivably introduce an attribute that means that, if the pointer
does escape, it may not result in an out-of-bounds read before the object
is fully initialized, and relying on the language guarantee, we could
attach it to the this pointer of every constructor. But then we'd need a
way of associating the attribute with the point at which the object is
fully initialized (with nested or inlined initialization there could be
many), LLVM attributes are not well suited to expressing this, and it would
mean introducing an attribute with very subtle (and language specific)
semantics.

Alternatively, we could relax this to an attribute that means whether the
pointer escapes at all, but this could not be applied to every constructor
(which means that we'd need an analysis to propagate it), and we'd be
leaving some optimization potential on the table because we aren't taking
into account the semantics of construction lifetimes. It seems fine to do
this as a local analysis for dealing with C-like initialization functions
that are emitted into the same translation unit but not inlined (e.g.
because of -Oz), but the scary part is trying to propagate it through
something like a ThinLTO summary. I think that the way that this would need
to work is that you'd need some way of expressing "if argument A does not
escape in this function, argument B does not escape from its callers", and
this seems very tricky to get right. We would need a similar approach for
the written attribute if we did not have pattern constructor variants.

In the end, we know what IR we want to end up with for a constructor call,
so it makes far more sense to me to simply have the frontend generate the
IR that we want using its knowledge of the language semantics, instead of
relying on a complex analysis to produce it for us. And given that we are
trying to ensure a security property, the analysis should be as simple as
reasonably possible.

Thanks,
-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190410/0a56ce3a/attachment.html>