[cfe-dev] [RFC] automatic variable initialization

Kostya Serebryany via cfe-dev cfe-dev at lists.llvm.org
Thu Nov 15 17:26:11 PST 2018

Very exciting, and long overdue. Thanks for doing this!
Countless security bugs would have been mitigated by this, see below.

Agree with the rationale: UUMs remain bugs, and we need to try hard to not
let developers rely on auto-initialization.
(e.g. in future patches we may decide to change the patterns, or to make
them different between the runs, etc)
All the old goodness (msan, -Wuninitialized, static analyses) is still

I am separately excited with this work because it is essentially a
precursor to efficient support for ARM's memory tagging extension (MTE).
if we can make enough compiler optimizations to auto-initialize locals with
low overhead, then MTE stack instrumentation will come for ~ free.

Does -Wuninitialized still work with -ftrivial-auto-var-init=pattern|zero?

In later patches we may need to have flags to separately control auto-init
of scalars, PODs, arrays of data, arrays of pointers, etc.
because in some cases we could achieve 90% of benefit at 10% of cost.

I think that zero-init is going to be substantially cheaper than
pattern-init, but happy to be wrong.

Here are some links to bugs, vulnerabilities and full exploits based on
uses of uninitialized memory.
The list is not exhaustive by any means, and we keep finding them every
The problem is, of course, that we don't find all of them.


   Linux kernel: KMSAN trophies
   <https://github.com/google/kmsan/wiki/KMSAN-Trophies>, more trophies

   Chrome: 700+ UMRs Chromium found by fuzzing

   Android: userspace: CVE-2018-9345/CVE-2018-9346
   , CVE-2017-13252
   kernel: CVE-2017-9075
   12% of all bugs (as of 2016


   OSS: 700+ bugs in various OSS projects found by fuzzing

   Project Zero (P0) findings: ~139 total


   Mozilla: 100+ bugs

   "Detecting Kernel Memory Disclosure with x86 Emulation and Taint
   Tracking" <https://j00ru.vexillium.org/papers/2018/bochspwn_reloaded.pdf>
   (Sections 3.5 and 6.1.2)


   Leaks of sensitive information

      Linux kernel:

         disclosure of large chunks of kernel memory.

         https://alephsecurity.com/vulns/aleph-2016005: Android, uninitialized
         kernel memory leak over USB

      Windows kernel:


         (CVE-2017-8685) - a continuous leak of 1kB from the Windows
kernel stack,
         discovered by diffing win32k.sys between Windows 7 and Windows 10. It
         enabled an attacker to e.g. perform system-wide keyboard
sniffing to some
         extent. Mentioned in P0 blog
         post about bindiffing.

         (CVE-2017-11817) - a leak of ~4kB of uninitialized Windows kernel pool
         memory to NTFS metadata upon mounting the file system,
without requiring
         user interaction. Made it possible to "exfiltrate" kernel
memory from a
         powered-on but locked Windows machine through the USB port.

         (CVE-2018-1037) - 3 kB of uninitialized user-mode heap memory
leaking from
         Microsoft build servers into a small percentage of .pdb symbol files
         publicly available through the Microsoft Symbol Server.

         (CVE-2017-8680) - disclosure of a controlled number of
uninitialized bytes
         from the Windows kernel pool.

         #248 <https://bugs.chromium.org/p/project-zero/issues/detail?id=>,
         (CVE-2015-0089, many other CVEs) - a disclosure of uninitialized
         user/kernel-mode heap memory in the OpenType glyph outline VM program,
         which affected the Windows kernel, user-mode DirectWrite and WPF
         components, Adobe Reader, and Oracle Java. Discussed in detail in a P0
         blog post

      User space:

         *bleed continues

   Leaks of pointers (allows further attacks)

      Windows kernel:

         (CVE-2016-3262) - rendering of uninitialized heap bytes as
pixels in EMF
         files parsed by user-mode Microsoft GDI+. Considered a
WontFix by Microsoft
         until it turned out that Office Online was vulnerable and
could leak memory
         from Microsoft servers, at which point they fixed the bug.

         (CVE-2015-2433) - a 0-day Windows kernel memory disclosure that was
         discovered in the Hacking Team dump in July 2015, and was
         found by ex-P0 member Matt Tait. It was used in an exploit
chain to defeat
         KASLR and reveal the base address of win32k.sys.

         <https://bugs.chromium.org/p/project-zero/issues/detail?id=1311> -
         various examples of relatively long (~100+ bytes) continuous
disclosure of
         Windows kernel memory, which could be easily used to de-aslr
the kernel,
         leak stack cookies etc.

      MacOS kernel:

         CVE-2017-2357 <https://nvd.nist.gov/vuln/detail/CVE-2017-2357>,
         CVE-2017-{13836, 13840, 13841, 13842},


         (more of such

      User space:

         Android, uninitialized heap memory which could help break ASLR in a
         privileged process

   Privilege escalation / code execution

      Linux kernel: unauthorized access to IPC objects

      Windows kernel:


         (CVE-2015-0090) - off-by-one in the OpenType glyph outline VM in the
         Windows kernel, which led to arbitrary read/write thanks to accessing
         uninitialized pointers. Successfully exploited for privilege
escalation on
         Windows 8.1 64-bit, as shown here

      MacOS kernel:

         CVE-2017-2358 <https://nvd.nist.gov/vuln/detail/CVE-2017-2358>

         <https://bugs.chromium.org/p/project-zero/issues/detail?id=618> (
         CVE-2016-1721 <https://support.apple.com/en-us/HT205732>): a local
         user may be able to execute arbitrary code with kernel privileges


On Thu, Nov 15, 2018 at 2:53 PM JF Bastien via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> Hello security fans!
> I’ve just uploaded a patch proposing opt-in automatic variable
> initialization. I’d appreciate comments on the overall approach, as well as
> on the specific implementation.
> Here’s the patch:
> https://reviews.llvm.org/D54604
> And here’s the description:
> Automatic variable initialization
> Add an option to initialize automatic variables with either a pattern or
> with
> zeroes. The default is still that automatic variables are uninitialized.
> Also
> add attributes to request pattern / zero / uninitialized on a per-variable
> basis, mainly to disable initialization of large stack arrays when deemed
> too
> expensive.
> This isn't meant to change the semantics of C and C++. Rather, it's meant
> to be
> a last-resort when programmers inadvertently have some undefined behavior
> in
> their code. This patch aims to make undefined behavior hurt less, which
> security-minded people will be very happy about. Notably, this means that
> there's no inadvertent information leak when:
>   - The compiler re-uses stack slots, and a value is used uninitialized.
>   - The compiler re-uses a register, and a value is used uninitialized.
>   - Stack structs / arrays / unions with padding are copied.
> This patch only addresses stack and register information leaks. There's
> many
> more infoleaks that we could address, and much more undefined behavior that
> could be tamed. Let's keep this patch focused, and I'm happy to address
> related
> issues elsewhere.
> To keep the patch simple, only some `undef` is removed for now, see
> `replaceUndef`. The padding-related infoleaks are therefore not all gone
> yet.
> This will be addressed in a follow-up, mainly because addressing
> padding-related
> leaks should be a stand-alone option which is implied by variable
> initialization.
> There are three options when it comes to automatic variable initialization:
>   0. Uninitialized
>     This is C and C++'s default. It's not changing. Depending on code
>     generation, a programmer who runs into undefined behavior by using an
>     uninialized automatic variable may observe any previous value
> (including
>     program secrets), or any value which the compiler saw fit to
> materialize on
>     the stack or in a register (this could be to synthesize an immediate,
> to
>     refer to code or data locations, to generate cookies, etc).
>   1. Pattern initialization
>     This is the recommended initialization approach. Pattern
> initialization's
>     goal is to initialize automatic variables with values which will likely
>     transform logic bugs into crashes down the line, are easily
> recognizable in
>     a crash dump, without being values which programmers can rely on for
> useful
>     program semantics. At the same time, pattern initialization tries to
>     generate code which will optimize well. You'll find the following
> details in
>     `patternFor`:
>     - Integers are initialized with repeated 0xAA bytes (infinite scream).
>     - Vectors of integers are also initialized with infinite scream.
>     - Pointers are initialized with infinite scream on 64-bit platforms
> because
>       it's an unmappable pointer value on architectures I'm aware of.
> Pointers
>       are initialize to 0x000000AA (small scream) on 32-bit platforms
> because
>       32-bit platforms don't consistently offer unmappable pages. When
> they do
>       it's usually the zero page. As people try this out, I expect that
> we'll
>       want to allow different platforms to customize this, let's do so
> later.
>     - Vectors of pointers are initialized the same way pointers are.
>     - Floating point values and vectors are initialized with a vanilla
> quiet NaN
>       (e.g. 0x7ff00000 and 0x7ffe000000000000). We could use other NaNs,
> say
>       0xfffaaaaa (negative NaN, with infinite scream payload). NaNs are
> nice
>       (here, anways) because they propagate on arithmetic, making it more
> likely
>       that entire computations become NaN when a single uninitialized value
>       sneaks in.
>     - Arrays are initialized to their homogeneous elements' initialization
>       value, repeated. Stack-based Variable-Length Arrays (VLAs) are
>       runtime-initialized to the allocated size (no effort is made for
> negative
>       size, but zero-sized VLAs are untouched even if technically
> undefined).
>     - Structs are initialized to their heterogeneous element's
> initialization
>       values. Zero-size structs are initialized as 0xAA since they're
> allocated
>       a single byte.
>     - Unions are initialized using the initialization for the largest
> member of
>       the union.
>     Expect the values used for pattern initialization to change over time,
> as we
>     refine heuristics (both for performance and security). The goal is
> truly to
>     avoid injecting semantics into undefined behavior, and we should be
>     comfortable changing these values when there's a worthwhile point in
> doing
>     so.
>     Why so much infinite scream? Repeated byte patterns tend to be easy to
>     synthesize on most architectures, and otherwise memset is usually very
>     efficient. For values which aren't entirely repeated byte patterns,
>     will often generate code which does memset + a few stores.
>   2. Zero initialization
>     Zero initialize all values. This has the unfortunate side-effect of
>     providing semantics to otherwise undefined behavior, programs therefore
>     might start to rely on this behavior, and that's sad. However, some
>     programmers believe that pattern initialization is too expensive for
> them,
>     and data might show that they're right. The only way to make these
>     programmers wrong is to offer zero-initialization as an option, figure
> out
>     where they are right, and optimize the compiler into submission. Until
> the
>     compiler provides acceptable performance for all security-minded code,
> zero
>     initialization is a useful (if blunt) tool.
> I've been asked for a fourth initialization option: user-provided byte
> value.
> This might be useful, and can easily be added later.
> Why is an out-of band initialization mecanism desired? We could instead use
> -Wuninitialized! Indeed we could, but then we're forcing the programmer to
> provide semantics for something which doesn't actually have any (it's
> uninitialized!). It's then unclear whether `int derp = 0;` lends meaning
> to `0`,
> or whether it's just there to shut that warning up. It's also way easier
> to use
> a compiler flag than it is to manually and intelligently initialize all
> values
> in a program.
> Why not just rely on static analysis? Because it cannot reason about all
> dynamic
> code paths effectively, and it has false positives. It's a great tool,
> could get
> even better, but it's simply incapable of catching all uses of
> uninitialized
> values.
> Why not just rely on memory sanitizer? Because it's not universally
> available,
> has a 3x performance cost, and shouldn't be deployed in production. Again,
> it's
> a great tool, it'll find the dynamic uses of uninitialized variables that
> your
> test coverage hits, but it won't find the ones that you encounter in
> production.
> What's the performance like? Not too bad! Previous publications [0] have
> cited
> 2.7 to 4.5% averages. We've commmitted a few patches over the last few
> months to
> address specific regressions, both in code size and performance. In all
> cases,
> the optimizations are generally useful, but variable initialization
> benefits
> from them a lot more than regular code does. We've got a handful of other
> optimizations in mind, but the code is in good enough shape and has found
> enough
> latent issues that it's a good time to get the change reviewed, checked
> in, and
> have others kick the tires. We'll continue reducing overheads as we try
> this out
> on diverse codebases.
> Is it a good idea? Security-minded folks think so, and apparently so does
> the
> Microsoft Visual Studio team [1] who say "Between 2017 and mid 2018, this
> feature would have killed 49 MSRC cases that involved uninitialized struct
> data
> leaking across a trust boundary. It would have also mitigated a number of
> bugs
> involving uninitialized struct data being used directly.". They seem to
> use pure
> zero initialization, and claim to have taken the overheads down to within
> noise.
> Don't just trust Microsoft though, here's another relevant person asking
> for
> this [2]. It's been proposed for GCC [3] and LLVM [4] before.
> What are the caveats? A few!
>   - Variables declared in unreachable code, and used later, aren't
> initialized.
>     This goto, Duff's device, other objectionable uses of switch. This
> should
>     instead be a hard-error in any serious codebase.
>   - Volatile stack variables are still weird. That's pre-existing, it's
> really
>     the language's fault and this patch keeps it weird. We should deprecate
>     volatile [5].
>   - As noted above, padding isn't fully handled yet.
> I don't think these caveats make the patch untenable because they can be
> addressed separately.
> Should this be on by default? Maybe, in some circumstances. It's a
> conversation
> we can have when we've tried it out sufficiently, and we're confident that
> we've
> eliminated enough of the overheads that most codebases would want to
> opt-in.
> Let's keep our precious undefined behavior until that point in time.
> How do I use it:
>   1. On the command-line:
>     -ftrivial-auto-var-init=uninitialized (the default)
>     -ftrivial-auto-var-init=pattern
>     -ftrivial-auto-var-init=zero
>   2. Using an attribute:
>     int dont_initialize_me
> __attribute((trivial_auto_init("uninitialized")));
>     int zero_me __attribute((trivial_auto_init("zero")));
>     int pattern_me __attribute((trivial_auto_init("pattern")));
>   [0]:
> https://users.elis.ugent.be/~jsartor/researchDocs/OOPSLA2011Zero-submit.pdf
>   [1]: https://twitter.com/JosephBialek/status/1062774315098112001
>   [2]: https://outflux.net/slides/2018/lss/danger.pdf
>   [3]: https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00615.html
>   [4]:
> https://github.com/AndroidHardeningArchive/platform_external_clang/commit/776a0955ef6686d23a82d2e6a3cbd4a6a882c31c
>   [5]: http://wg21.link/p1152
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20181115/94b390bc/attachment.html>

More information about the cfe-dev mailing list