[cfe-dev] [RFC] Adding lifetime analysis to clang

Fri Apr 12 07:04:06 PDT 2019

Gabor and I chatted privately, and I wanted to summarize the discussion.

=== Concerns about inference of type categories ===

I agree that having type categories is a good idea.  I think this is
one of the most important user-facing features in the proposal.

I don't agree with inferring type categories though.  The rules are
too complex to allow the inference to be done implicitly without
hurting readability.  I think the users should be always required to
annotate their Owners and Pointers, but not Aggregates and not Values.
The implementation could still check the rules, if they are crucial.

If the rules were simpler (e.g., "Pointer is any type that satisfies
the iterator concept"), I would consider inference more acceptable,
but still borderline.  Iterator is a complex concept by itself, there
are many broken iterators out there that don't satisfy the concept,
but the simpler usages in practice compile anyway.  However, just the
Pointer rules way more complex than that -- they are like half a page
long.

I don't think we can simplify the rules though -- they need to cover
the necessary C++ concepts.

My primary concern with inference is not false positives, but
understandability and debuggability of the system.  When I get a
warning, how do I, as a human, determine if the types I use are being
correctly classified?  As an API vendor, how can I be sure that my
types are correctly classified so that users will get correct
warnings?  The answer is "add an annotation".  So wouldn't every API
provider want to add an annotation regardless then?

My worry is that a fully automatic system where the user is not
required to understand the rules, will not match users' expectations.
If a provider of a type thinks that it is a Pointer, but it actually
isn't, the failure mode is false negatives, not false positives. Users
will just think the analysis is too weak and can't detect the problem.
So the API providers will have to test that their types are indeed
recognized by the compiler with the correct type category.  At that
point they could as well add an annotation.

This is basically the same reason why C++ added the `override`
keyword, and C++ having backwards compatibility constraints made
`override` optional.  The inference rules for `override` are much
simpler than inference of type categories.  However, style guides
often require writing `override` where possible.

Some types don't conform to the Pointer category formally, even though
in spirit they are pointers.  For example, if style guide used in the
project prohibits operator overloading: https://godbolt.org/z/IYJwHw .
However, users who "don't fully understand the rules" could reasonably
think that the compiler should be able to figure out that this type is
a Pointer.  If the annotation was required, the problem would be
instantly discovered when the compiler would rightfully complain that
the type does not conform to Pointer.

I understand there are two primary reasons why inference is desired:
reduction of annotation burden, and third-party libraries.

To evaluate the annotation burden in practice, I think LLVM is not a
great example.  LLVM has an above-average number of Owners and
Pointers because it tries to implement standard library bits from
future C++ versions.  Also, those Owners and Pointers are in "ADT" and
"Support" libraries, not in regular application code.

Third-party libraries can be annotated through the "API Notes"
approach, that Swift's Objective-C interop uses.  API notes allow the
user to pass an annotations file that injects the necessary attributes
into the library headers.

=== Future work idea: "Optional" type category ===

I also think it would be great to have more type categories, in
particular "optional" -- which will include pointers (that can contain
null), `std::optional` (that can contain nullopt), and whatever other
types with an empty state people might have in their projects.  Then
we could use the same dataflow-sensitive rules as the proposal uses
for tracking null pointers to track checks of empty optionals etc.

However, the rules for assuming nullability seem to be loose (return
values and members are assumed non-null), I'm not very happy about
that: https://godbolt.org/z/CMxEVv . I understand this was probably
done to lower the annotation burden.  I don't think it makes for a
very understandable model though.  I also couldn't find a single place
in the proposal that summarized these assumptions, I had to piece them
together from different sections.

Dmitri