[cfe-dev] making -ftrivial-auto-var-init=zero a first-class option

David Blaikie via cfe-dev cfe-dev at lists.llvm.org
Wed Apr 22 09:58:47 PDT 2020

On Wed, Apr 22, 2020 at 9:54 AM Philip Reames via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> On 4/21/20 4:59 PM, Richard Smith wrote:
> On Tue, 21 Apr 2020 at 16:49, Philip Reames via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>> On 4/21/20 3:11 PM, Richard Smith via cfe-dev wrote:
>> What you're proposing is, without question, a language extension. Our
>> policy on language extensions is documented here:
>> http://clang.llvm.org/get_involved.html
>> Right now, this fails at point 4. We do not want to create or encourage
>> the creation of language dialects and non-portable code, so the place to
>> have this discussion is in the C and C++ committees. Both committees have
>> processes for specifying optional features these days, and they might be
>> amenable to using those processes to standardize the behavior you're asking
>> for. (I mean, maybe not, but our policy requires that you at least try.)
>> However, there is a variant on what you're proposing that might fare
>> better: instead of guaranteeing zero-initialization, we could guarantee
>> that any observation of an uninitialized variable *either* gives produces
>> zero or results in a trap. That is: it's still undefined to read from
>> uninitialized variables -- we still do not guarantee what will happen if
>> you do, and will warn on uninitialized uses and so on -- but we would bound
>> the damage that can result from such accesses. You would get the security
>> hardening benefits with the modest binary size impact. That approach would
>> not introduce the risk of creating a language dialect (at least, not to the
>> same extent), so our policy on avoiding language extensions would not apply.
>> Richard, just to check here, it sounds to me like you're raising more a
>> point of specification then of implementation right?  That is, you're not
>> stating that the actual implementation must sometimes trap (when producing
>> a zero wouldn't), but that the specification of the flags and docs must
>> leave the possibility there of?
> Well, I think it's not sufficient to merely say that we might do something
> like trap, if our intent is that we never will. We would need to reasonably
> agree that (for example) if someone came forward with a patch that actually
> implemented said trapping behavior and didn't introduce any significant
> code size or performance impact, that we would consider such a change to be
> a quality of implementation improvement. But I don't think we need anyone
> to have actually committed themselves to producing such a patch, or any
> timeline or expectation of when (or indeed whether) it would be done. Sorry
> if this is splitting a hair, but I think it's an important hair to split.
> Hair successfully split.  I agree it is a key distinction.

Personally, I find that a bit too fine (but wouldn't stand in the way of
the decision) & would prefer at least a rough/most basic trapping behavior
for that, so really obvious intentional use of zero init would fail -
making it hard for anyone to develop a coding convention/style around it,
etc. Wouldn't need to be fancy at all, that being the point - make it as
easy/unintrusive to implement, and catch the most blatant uses of zero
init, so that if someone tried to write code against a zero-init language
fork, their most obvious/common code would fail and they'd have sufficient
contortions that it'd be hard to argue it was an intentional/consistent way
to write code. "well, we explicitly zero init /these/ simple cases, but
rely on compiler-derived zero init in the more complicated cases where we
can (currently) get away with it... "

> If I'm complete misinterpreting, please just say so.  I don't want to
>> start a tangent discussion here, I just spotted what sound like it could be
>> a "quick fix" which lets to OP achieve their objective and wanted to call
>> it out if in fact I'd read correctly.
>> Philip
>> On Tue, 21 Apr 2020 at 14:21, Kees Cook via cfe-dev <
>> cfe-dev at lists.llvm.org> wrote:
>>> Hi,
>>> tl;dr: I'd like to revisit making -ftrivial-auto-var-init=zero an
>>> expressly
>>> supported option. To do this, I think we need to either entirely remove
>>> "-enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang"
>>> or rename it to something more directly reflecting the issue, like
>>> "-enable-trivial-auto-var-init-zero-knowing-it-forks-the-language".
>>> This is currently open as https://bugs.llvm.org/show_bug.cgi?id=45497
>>> Here is the situation: -ftrivial-auto-var-init=pattern is great for
>>> debugging, but -ftrivial-auto-var-init=zero is needed for production
>>> systems for mainly two reasons, each of which I will try to express
>>> context
>>> for:
>>> 1) performance and size
>>> As measured by various Google folks across a few projects and in
>>> various places, there's a fairly significant performance impact of
>>> using pattern-init over zero-init. I can let other folks chime in
>>> with their exact numbers, but I can at least share some measurements
>>> Alexander Potapenko made with the Linux kernel (see "Performance costs"):
>>> https://clangbuiltlinux.github.io/CBL-meetup-2020-slides/glider/Fighting_uninitialized_memory_%40_CBL_Meetup_2020.pdf
>>> tl;dr: zero-init tended to be half the cost of pattern-init, though it
>>> varied based on workload, and binary size impact fell over 95% going
>>> from pattern-init to zero-init.
>>> 2) security
>>> Another driving factor (see below from various vendors/projects), is the
>>> security stance. Putting non-zero values into most variables types ends
>>> up making them arguably more dangerous than if they were zero-filled.
>>> Most notably, sizes and indexes and less likely to be used out of bounds
>>> if they are zero-initialized. The same holds for bool values that tend
>>> to indicate success instead of failing safe with a false value. While
>>> pointers in the non-canonical range are nice, zero tends to be just
>>> as good. There are certainly exceptions here, but the bulk of the
>>> historical record on how "uninitialized" variables have been used in
>>> real world exploitation involve their being non-zero, and analysis of
>>> those bugs support that conclusion.
>>> Various positions from vendors and projects:
>>> Google (Android, Chrome OS)
>>> Both Android and Chrome OS initially started using pattern-init, but due
>>> to each of: the performance characteristics, the binary size changes, and
>>> the less robust security stance, both projects have recently committed
>>> to switching to zero-init.
>>> Microsoft (Windows)
>>> I'm repeating what Joe Bialek has told me, so he can clarify if I'm not
>>> representing this correctly... While not using Clang/LLVM, Microsoft is
>>> part of the larger C/C++ ecosystem and has implemented both zero-init
>>> (for production builds) and pattern-init (for debug builds) in their
>>> compiler too. They also chose zero-init for production expressly due
>>> to the security benefits.
>>> Some details of their work:
>>> https://github.com/microsoft/MSRC-Security-Research/blob/master/presentations/2019_09_CppCon/CppCon2019%20-%20Killing%20Uninitialized%20Memory.pdf
>>> Upstream Linux kernel
>>> Linus Torvalds has directly stated that he wants zero-init:
>>> "So I'd like the zeroing of local variables to be a native compiler
>>> option..."
>>> "This, btw, is why I also think that the "initialize with poison" is
>>> pointless and wrong."
>>> https://lore.kernel.org/lkml/CAHk-=wgTM+cN7zyUZacGQDv3DuuoA4LORNPWgb1Y_Z1p4iedNQ@mail.gmail.com/
>>> Unsurprisingly, I strongly agree. ;)
>>> GrapheneOS is using zero-init (rather than patching Clang as it used to,
>>> to get
>>> the same result):
>>> https://twitter.com/DanielMicay/status/1248384468181643272
>>> GCC
>>> There's been mostly silence on the entire topic of automatic variable
>>> initialization, though there have been patches proposed in the past for
>>> zero-init:
>>> https://gcc.gnu.org/legacy-ml/gcc-patches/2014-06/msg00615.html
>>> Apple
>>> I can't speak meaningfully here, but I've heard rumors that they are
>>> depending on zero-init as well. Perhaps someone there can clarify how
>>> they are using these features?
>>> So, while I understand the earlier objections to zero-init from a
>>> "language fork" concern, I think this isn't a position that can really
>>> stand up to the reality of how many projects are using the feature (even
>>> via non-Clang compilers). Given that so much code is going to be built
>>> using zero-init, what's the best way for Clang to adapt here? I would
>>> prefer to just drop the -enable... option entirely, but I think just
>>> renaming it would be fine too.
>>> Thoughts/flames? ;)
>>> --
>>> Kees Cook
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>> _______________________________________________
>> cfe-dev mailing listcfe-dev at lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200422/c1096946/attachment-0001.html>

More information about the cfe-dev mailing list