[cfe-dev] making -ftrivial-auto-var-init=zero a first-class option

Tue Apr 21 14:20:44 PDT 2020

Hi,

tl;dr: I'd like to revisit making -ftrivial-auto-var-init=zero an expressly
supported option. To do this, I think we need to either entirely remove
"-enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang"
or rename it to something more directly reflecting the issue, like
"-enable-trivial-auto-var-init-zero-knowing-it-forks-the-language".

This is currently open as https://bugs.llvm.org/show_bug.cgi?id=45497

Here is the situation: -ftrivial-auto-var-init=pattern is great for
debugging, but -ftrivial-auto-var-init=zero is needed for production
systems for mainly two reasons, each of which I will try to express context
for:

1) performance and size

As measured by various Google folks across a few projects and in
various places, there's a fairly significant performance impact of
using pattern-init over zero-init. I can let other folks chime in
with their exact numbers, but I can at least share some measurements
Alexander Potapenko made with the Linux kernel (see "Performance costs"):
https://clangbuiltlinux.github.io/CBL-meetup-2020-slides/glider/Fighting_uninitialized_memory_%40_CBL_Meetup_2020.pdf
tl;dr: zero-init tended to be half the cost of pattern-init, though it
varied based on workload, and binary size impact fell over 95% going
from pattern-init to zero-init.

2) security

Another driving factor (see below from various vendors/projects), is the
security stance. Putting non-zero values into most variables types ends
up making them arguably more dangerous than if they were zero-filled.
Most notably, sizes and indexes and less likely to be used out of bounds
if they are zero-initialized. The same holds for bool values that tend
to indicate success instead of failing safe with a false value. While
pointers in the non-canonical range are nice, zero tends to be just
as good. There are certainly exceptions here, but the bulk of the
historical record on how "uninitialized" variables have been used in
real world exploitation involve their being non-zero, and analysis of
those bugs support that conclusion.

Various positions from vendors and projects:

Google (Android, Chrome OS)

Both Android and Chrome OS initially started using pattern-init, but due
to each of: the performance characteristics, the binary size changes, and
the less robust security stance, both projects have recently committed
to switching to zero-init.

Microsoft (Windows)

I'm repeating what Joe Bialek has told me, so he can clarify if I'm not
representing this correctly... While not using Clang/LLVM, Microsoft is
part of the larger C/C++ ecosystem and has implemented both zero-init
(for production builds) and pattern-init (for debug builds) in their
compiler too. They also chose zero-init for production expressly due
to the security benefits.

Some details of their work:
https://github.com/microsoft/MSRC-Security-Research/blob/master/presentations/2019_09_CppCon/CppCon2019%20-%20Killing%20Uninitialized%20Memory.pdf

Upstream Linux kernel

Linus Torvalds has directly stated that he wants zero-init:
"So I'd like the zeroing of local variables to be a native compiler
option..."
"This, btw, is why I also think that the "initialize with poison" is
pointless and wrong."
https://lore.kernel.org/lkml/CAHk-=wgTM+cN7zyUZacGQDv3DuuoA4LORNPWgb1Y_Z1p4iedNQ@mail.gmail.com/
Unsurprisingly, I strongly agree. ;)

GrapheneOS is using zero-init (rather than patching Clang as it used to, to get
the same result):
https://twitter.com/DanielMicay/status/1248384468181643272

GCC
There's been mostly silence on the entire topic of automatic variable
initialization, though there have been patches proposed in the past for
zero-init:
https://gcc.gnu.org/legacy-ml/gcc-patches/2014-06/msg00615.html

Apple

I can't speak meaningfully here, but I've heard rumors that they are
depending on zero-init as well. Perhaps someone there can clarify how
they are using these features?

So, while I understand the earlier objections to zero-init from a
"language fork" concern, I think this isn't a position that can really
stand up to the reality of how many projects are using the feature (even
via non-Clang compilers). Given that so much code is going to be built
using zero-init, what's the best way for Clang to adapt here? I would
prefer to just drop the -enable... option entirely, but I think just
renaming it would be fine too.

Thoughts/flames? ;)

-- 
Kees Cook