[cfe-dev] RFC: Nullability qualifiers

Tue Mar 3 19:55:29 PST 2015

----- Original Message -----

> From: "Richard Smith" <richard at metafoo.co.uk>
> To: "Douglas Gregor" <dgregor at apple.com>
> Cc: "cfe-dev Developers" <cfe-dev at cs.uiuc.edu>
> Sent: Monday, March 2, 2015 5:34:18 PM
> Subject: Re: [cfe-dev] RFC: Nullability qualifiers

> On Mon, Mar 2, 2015 at 1:22 PM, Douglas Gregor < dgregor at apple.com >
> wrote:

> > Hello all,
> 

> > Null pointers are a significant source of problems in applications.
> > Whether it’s SIGSEGV taking down a process or a foolhardy attempt
> > to
> > recover from NullPointerException breaking invariants everywhere,
> > it’s a problem that’s bad enough for Tony Hoare to call the
> > invention of the null reference his billion dollar mistake [1].
> > It’s
> > not the ability to create a null pointer that is a problem—having a
> > common sentinel value meaning “no value” is extremely useful—but
> > that it’s very hard to determine whether, for a particular pointer,
> > one is expected to be able to use null. C doesn’t distinguish
> > between “nullable” and “nonnull” pointers, so we turn to
> > documentation and experimentation. Consider strchr from the C
> > standard library:
> 

> > char *strchr(const char *s, int c);
> 

> > It is “obvious” to a programmer who knows the semantics of strchr
> > that it’s important to check for a returned null, because null is
> > used as the sentinel for “not found”. Of course, your tools don’t
> > know that, so they cannot help when you completely forget to check
> > for the null case. Bugs ensue.
> 

> > Can I pass a null string to strchr? The standard is unclear [2],
> > and
> > my platform’s implementation happily accepts a null parameter and
> > returns null, so obviously I shouldn’t worry about it… until I port
> > my code, or the underlying implementation changes because my
> > expectations and the library implementor’s expectations differ.
> > Given the age of strchr, I suspect that every implementation out
> > there has an explicit, defensive check for a null string, because
> > it’s easier to add yet more defensive (and generally useless) null
> > checks than it is to ask your clients to fix their code. Scale this
> > up, and code bloat ensues, as well as wasted programmer effort that
> > obscures the places where checking for null really does matter.
> 

> > In a recent version of Xcode, Apple introduced an extension to
> > C/C++/Objective-C that expresses the nullability of pointers in the
> > type system via new nullability qualifiers . Nullability qualifiers
> > express nullability as part of the declaration of strchr [2]:
> 

> > __nullable char *strchr(__nonnull const char *s, int c);
> 

> > With this, programmers and tools alike can better reason about the
> > use of strchr with null pointers.
> 

> > We’d like to contribute the implementation (and there is a patch
> > attached at the end [3]), but since this is a nontrivial extension
> > to all of the C family of languages that Clang supports, we believe
> > that it needs to be discussed here first.
> 

> > Goals
> 
> > We have several specific goals that informed the design of this
> > feature.
> 

> > * Allow the intended nullability to be expressed on all pointers :
> > Pointers are used throughout library interfaces, and the
> > nullability
> > of those pointers is an important part of the API contract with
> > users. It’s too simplistic to only allow function parameters to
> > have
> > nullability, for example, because it’s also important information
> > for data members, pointers-to-pointers (e.g., "a nonnull pointer to
> > a nullable pointer to an integer”), arrays of pointers, etc.
> 
> > * Enable better tools support for detecting nullability problems:
> > The
> > nullability annotations should be useful for tools (especially the
> > static analyzer) that can reason about the use of null, to give
> > warnings about both missed null checks (the result of strchr could
> > be null…) as well as for unnecessarily-defensive code.
> 
> > * Support workflows where all interfaces provide nullability
> > annotations: In moving from a world where there are no nullability
> > annotations to one where we hope to see many such annotations,
> > we’ve
> > found it helpful to move header-by-header, auditing a complete
> > header to give it nullability qualifiers. Once one has done that,
> > additions to the header need to be held to the same standard, so we
> > need a design that allows us to warn about pointers that don’t
> > provide nullability annotations for some declarations in a header
> > that already has some nullability annotations.
> 

> > * Zero effect on ABI or code generation: There are a huge number of
> > interfaces that could benefit from the use of nullability
> > qualifiers, but we won’t get widespread adoption if introducing the
> > nullability qualifiers means breaking existing code, either in the
> > ABI (say, because nullability qualifiers are mangled into the type)
> > or at execution time (e.g., because a non-null pointer ends up
> > being
> > null along some error path and causes undefined behavior).
> 

> A sanitizer for this feature would seem very useful, but this bullet
> point suggests that such a sanitizer would violate the model.
> Likewise, I don't see why we should rule out the option of
> optimizing on the basis of these qualifiers (under a
> -fstrict-nonnull flag or similar).

> > Why not __attribute__((nonnull))?
> 
> > Clang already has an attribute to express nullability, “nonnull”,
> > which we inherited from GCC [4]. The “nonnull” attribute can be
> > placed on functions to indicate which parameters cannot be null:
> > one
> > either specifies the indices of the arguments that cannot be null,
> > e.g.,
> 
> > extern void *my_memcpy (void *dest, const void *src, size_t len)
> > __attribute__((nonnull (1, 2)));
> 
> > or omits the list of indices to state that all pointer arguments
> > cannot be null, e.g.,
> 
> > extern void *my_memcpy (void *dest, const void *src, size_t len)
> > __attribute__((nonnull));
> 
> > More recently, “nonnull” has grown the ability to be applied to
> > parameters, and one can use the companion attribute returns_nonnull
> > to state that a function returns a non-null pointer:
> 
> > extern void *my_memcpy (__attribute__((nonnull)) void *dest,
> >  __attribute__((nonnull)) const void *src, size_t len)
> > __attribute__((returns_nonnull));
> 
> > There are a number of problems here. First, there are different
> > attributes to express the same idea at different places in the
> > grammar, and the use of the “nonnull” attribute on the function
> > actually has an effect on the function parameters can get very,
> > very
> > confusing. Quick, which pointers are nullable vs. non-null in this
> > example?
> 

> > __attribute__((nonnull)) void *my_realloc (void *ptr, size_t size);
> 

> > According to that declaration, ptr is nonnull and the function
> > returns a nullable pointer… but that’s the opposite of how it reads
> > (and behaves, if this is anything like a realloc that cannot fail).
> > Moreover, because these two attributes are declaration attributes,
> > not type attributes, you cannot express that nullability of the
> > inner pointer in a multi-level pointer or an array of pointers,
> > which makes these attributes verbose, confusing, and not
> > sufficiently generally. These attributes fail the first of our
> > goals.
> 

> > These attributes aren’t as useful as they could be for tools
> > support
> > (the second and third goals), because they only express the nonnull
> > case, leaving no way to distinguish between the unannotated case
> > (nobody has documented the nullability of some parameter) and the
> > nullable case (we know the pointer can be null). From a tooling
> > perspective, this is a killer: the static analyzer absolutely
> > cannot
> > warn that one has forgotten to check for null for every unannotated
> > pointer, because the false-positive rate would be astronomical.
> 

> > Finally, we’ve recently started considering violations of the
> > __attribute__((nonnull)) contract to be undefined behavior, which
> > fails the last of our goals. This is something we could debate
> > further if it were the only problem, but these declaration
> > attributes fall all of our criteria, so it’s not worth discussing.
> 
On this last point, how do you want to define the interaction between these? Should we not consider the violation to be undefined behavior if these new qualifiers are present? 

-Hal 

> > Nullability Qualifiers
> 
> > We propose the addition of a new set of type qualifiers, spelled
> > __nullable , __nonnull , and __null_unspecified , to Clang. These
> > are collectively known as nullability qualifiers and may be written
> > anywhere any other type qualifier may be written (such as const )
> > on
> > any type subject to the following restrictions:
> 

> > * Two nullability qualifiers shall not appear in the same set of
> > qualifiers.
> 
> > * A nullability qualifier shall qualify any pointer type, including
> > pointers to objects, pointers to functions, C++ pointers to
> > members,
> > block pointers, and Objective-C object pointers.
> 
> > * A nullability qualifier in the declaration-specifiers applies to
> > the innermost pointer type of each declarator (e.g., __nonnull int
> > *
> > is equivalent to int * __nonnull ).
> 

> What happens if there's a mixture of different kinds of declarator?
> (Can I have '__nonnull int (*p)[3]'? Can I have '__nonnull int
> *p[3];'?)

> I think you're saying that this decision is made based on the syntax
> of the declarator and not based on the underlying type, right? (So
> in

> __nonnull T *

> the __nonnull appertains to the *, even if T names a pointer type.)
> Given that...
> > * A nullability qualifier applied to a typedef of a
> > nullability-qualified pointer type shall specify the same
> > nullability as the underlying type of the typedef.
> 

> ... I don't really see what this rule is for. I would expect
> "__nonnull T" to be ill-formed because the innermost component of
> the declarator is not a pointer, irrespective of whether T is a
> pointer type and whether it's nullable. And I'd expect "__nonnull T
> *" to be valid whether or not T is a typedef for a __nonnull
> pointer.

> On the whole, I find it a little strange to allow a nullability
> qualifier in the decl-specifier-seq / specifiers-and-qualifiers that
> applies to some later pointer declarator; I would have expected this
> to be permitted in the cv-qualifier-seq / type-qualifier-list after
> the pointer operator, and nowhere else (or perhaps permitted in a
> decl-specifier-seq that also contains a type-specifier for a pointer
> type). This kind of flexibility has proven a disaster for the
> comprehensibility of GCC's type attributes.

> > The meanings of the three nullability qualifiers are as follows:
> 

> > __nullable : the pointer may store a null value at runtime (as part
> > of the API contract)
> 
> > __nonnull : the pointer should not store a null value at runtime
> > (as
> > part of the API contract). it is possible that the value can be
> > null, e.g., in erroneous historic uses of an API, and it is up to
> > the library implementor to decide to what degree she will
> > accommodate such clients.
> 
> > __null_unspecified : it is unclear whether the pointer can be null
> > or
> > not. Use of this type qualifier is extremely rare in practice, but
> > it fills a small but important niche when auditing a particular
> > header to add nullability qualifiers: sometimes the nullability
> > contract for a few APIs in the header is unclear even when looking
> > at the implementation for historical reasons, and establishing the
> > contract requires more extensive study. In such cases, it’s often
> > best to mark that pointer as __null_unspecified (which will help
> > silence the warning about unannotated pointers in a header) and
> > move
> > on, coming back to __null_unspecified pointers when the appropriate
> > graybeard has been summoned out of retirement [5].
> 
> Have you considered adding C++11 attributes as synonyms for these?

> > Assumes-nonnull Regions
> 
> > We’ve found that it's fairly common for the majority of pointers
> > within a particular header to be __nonnull. Therefore, we’ve
> > introduced assumes-nonnull regions that assume that certain
> > unannotated pointers implicitly get the __nonnull nullability
> > qualifiers. Assumes-nonnull regions are marked by pragmas:
> 

> > #pragma clang assume_nonnull begin
> 
> > __nullable char *strchr(const char *s, int c); // s is inferred to
> > be
> > __nonnull
> 
> > void *my_realloc (__nullable void *ptr, size_t size); // my_realloc
> > is inferred to return __nonnull
> 
> > #pragma clang assume_nonnull end
> 
> These pragmas seem easy to miss when moving declarations around while
> refactoring. Do you have enough experience with the feature to know
> if that's an issue in practice?

> > We infer __nonnull within an assumes_nonnull region when:
> 

> > * The pointer is a non-typedef declaration, such as a function
> > parameter, variable, or data member, or the result type of a
> > function. It’s very rare for one to warn typedefs to specify
> > nullability information; rather, it’s usually the user of the
> > typedef that needs to specify nullability.
> 

> How can they do this, given the earlier rules?

> > * The pointer is a single-level pointer, e.g., int* but not int** ,
> > because we’ve found that programmers can get confused about the
> > nullability of multi-level pointers (is it a __nullable pointer to
> > __nonnull pointers, or the other way around?) and inferring
> > nullability for any of the pointers in a multi-level pointer
> > compounds the situation.
> 

> > Note that no #include may occur within an assumes_nonnull region,
> > and
> > assumes_nonnull regions cannot cross header boundaries.
> 
> That sounds like it would make the lives of library maintainers using
> this feature painful -- they would need to textually duplicate these
> pragmas and the surrounding #ifdefs in every system header that
> needs them, rather than factoring them out into #includable
> begin/end files. But I suppose we can encourage the use of a macro
> expanding to _Pragma for those cases.

> > Type System Impact
> 
> > Nullability qualifiers are mapped to type attributes within the
> > Clang
> > type system, but a nullability-qualified pointer type is not
> > semantically distinct from its unqualified pointer type. Therefore,
> > one may freely convert between nullability-qualified and
> > non-nullability-qualified pointers, or between
> > nullability-qualified
> > pointers with different nullability qualifiers. One cannot overload
> > on nullability qualifiers, write C++ class template partial
> > specializations that identify nullability qualifiers, or inspect
> > nullability via type traits in any way.
> 

> > Said more strongly, removing nullability qualifiers from a
> > well-formed program will not change its behavior in any way, nor
> > will the semantics of a program change when any set of
> > (well-formed)
> > nullability qualifiers are added to it. Operationally, this means
> > that nullability qualifiers are not part of the canonical type in
> > Clang’s type system, and that any warnings we produce based on
> > nullability information will necessarily be dependent on Clang’s
> > ability to retain type sugar during semantic analysis.
> 

> > While it’s somewhat exceptional for us to introduce new type
> > qualifiers that don’t produce semantically distinct types, we feel
> > that this is the only plausible design and implementation strategy
> > for this feature: pushing nullability qualifiers into the type
> > system semantically would cause significant changes to the language
> > (e.g., overloading, partial specialization) and break ABI (due to
> > name mangling) that would drastically reduce the number of
> > potential
> > users, and we feel that Clang’s support for maintaining type sugar
> > throughout semantic analysis is generally good enough [6] to get
> > the
> > benefits of nullability annotations in our tools.
> 
> This seems reasonable to me, given the constraints. (I've had some
> offline discussions with various people about template type sugar
> reconstruction, which would help to diagnose issues here.)

> > Looking forward to our discussion.
> 

> > - Doug (with Jordan Rose and Anna Zaks)
> 

> > [1]
> > http://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions
> 
> > [2] The standard description of strchr seems to imply that the
> > parameter cannot be null
> 
> > [3] The patch is complete, but should be reviewed on cfe-commits
> > rather than here. There are also several logic parts to this
> > monolithic patch:
> 
> > (a) __nonnull/__nullable/__null_unspecified type specifiers
> 

> > (b) nonnull/nullable/null_unspecified syntactic sugar for
> > Objective-C
> 
> > (c) Warning about inconsistent application of nullability
> > specifiers
> > within a given header
> 
> > (d) assume_nonnnull begin/end pragmas
> 
> > (e) Objective-C null_resettable property attribute
> 
> > [4] https://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html
> > (search for “nonnull”)
> 
> > [5] No graybeards were harmed in the making of this feature.
> 
> > [6] Template instantiation is the notable exception here, because
> > it
> > always canonicalizes types.
> 

> > _______________________________________________
> 
> > cfe-dev mailing list
> 
> > cfe-dev at cs.uiuc.edu
> 
> > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
> 

> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150303/4852d11c/attachment.html>