[cfe-dev] [RFC] Upstreaming Lifetime Function Annotations

Thu Dec 12 10:34:17 PST 2019

Gábor wrote:

> I would love to see a simpler syntax. I think implementing this the most general

> way possible and adding some syntactic sugar later on after having some data

> about the most common patterns might make sense.

Yes. In particular, the nicest “nice syntax” of all is intended by design to be whitespace for the large majority of functions, via defaults that are equivalent to default annotations… the Lifetime design paper <https://github.com/isocpp/CppCoreGuidelines/blob/master/docs/Lifetime.pdf>  proposes such defaults (in sections 2.5.3 and 2.5.5), and the aim is that very few functions ever want explicit annotation.

So far in our experience that has been working very well, including that the large majority of std:: standard library functions just work unchanged without annotation:

*	To see basic examples, please look at https://godbolt.org/z/1C4t8m (which involves std::min) and https://godbolt.org/z/4G-8H- (which diagnoses a StackOverflow question involving unique_ptr). No annotation of the standard library was needed in either example, they use the stock unmodified std:: implementation.*
*	For an (IMO pretty slick) example of zero-annotation of more complex code, see https://godbolt.org/z/eqCRLx – the code it diagnoses is actually quite complex (vector push_back but also ranges with filtering views) and all unannotated, and it gives nice and accurate diagnostics (pretty much “hey, your ranges::view::filter is dangling here on line B, because your vector<int>.push_back() invalidated it here on line A”).
*	See also section 2.6.2 in the design paper which shows that the string_view dangling problem examples given in WG21 paper P0936 are (I think all) diagnosed without any explicit annotation, because the proposed defaults do the right thing.

Those are examples, many of them drawn from real-world code, that needed no annotation at all with the proposed default annotations.

So I would propose this:

1.	Implement the general form. We know that occasionally we’ll need that, and then we can express anything in #2 and #3 as equivalents/sugars for something expressible in #1.
2.	Implement the proposed default rules (including tweak them as we gain experience in larger codebases) so that whitespace zero-annotation Just Works for (we hope) a very large number of functions.
3.	Then see if we actually need any in-between syntactic sugars at all. If #2 covers a sufficiently high % of functions, we may not even be interested in other sugars. And if we discover there are patterns that #2 doesn’t cover well, then we can add sugars for those patterns.

How does that sound?

Herb

* As a temporary implementation detail, the prototype does currently hardwire knowledge that unique_ptr and vector are owners, but there also proposed default zero-annotation rules for automatically recognizing which types are Owners (see section 2.1) which recognize those types without annotation (e.g., it recognizes containers and smart pointers as implicitly Owners).

From: Gábor Horváth <xazax at google.com> 
Sent: Thursday, December 12, 2019 8:43 AM
To: Dmitri Gribenko <gribozavr at gmail.com>
Cc: cfe-dev <cfe-dev at lists.llvm.org>; gehre.matthias at gmail.com; Dmitri Gribenko <dmitrig at google.com>; Herb Sutter <hsutter at microsoft.com>; Kyle Reed <kylereed at microsoft.com>; Aaron Ballman <aaron.ballman at gmail.com>; Artem Dergachev <adergachev at apple.com>; xazax.hun <xazax.hun at gmail.com>; Petr Hosek <phosek at google.com>; Haowei Wu <haowei at google.com>; larsklein53 at gmail.com; Richard Smith <richard at metafoo.co.uk>
Subject: Re: [cfe-dev] [RFC] Upstreaming Lifetime Function Annotations

Hi Dmitri,

On Thu, Dec 12, 2019 at 5:44 AM Dmitri Gribenko <gribozavr at gmail.com <mailto:gribozavr at gmail.com> > wrote:

Hi Gábor,

I'm very excited about lifetime annotations! I have two comments/requests.

The first one is about the implementation. There are quite a few warnings, ClangTidy checkers and other analyses in Clang that are dataflow-based or at least sort of dataflow-based. They all have to implement the interpretation of Clang's AST semantics, which is unfortunate, because it is very complicated logic that is refined and polished over time as we get to use those warnings and checkers, and yet, we can't reuse this logic for any new checker.

Therefore, my request is to try to structure the implementation in such a way that it is at least plausible to factor out the "dataflow engine" parts of the static analysis in future, and keep the abstract domain and lifetime specifics more or less separate.

Thanks, I totally agree. I think it should be relatively easy to separate the CFG traversal/fixed point iteration part. Anything bigger is likely to be more involved, but we definitely will strive for some reusability. 

On Thu, Dec 5, 2019 at 12:02 AM Gábor Horváth via cfe-dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org> > wrote:

  const char *find(const string &haystack, const string &needle) 
      [[gsl::post(lifetime(find, {haystack, null}))]];

I have a concern about the bulkiness of the syntax. I understand why it ended up this way (use standard attribute syntax, use the contracts syntax, ensure that names are referenced syntactically after they are declared, and we get the proposed syntax) -- that helps with rationalization, but that does not help me justify it.

  struct Match { const char *pos; /* ... */ };
  bool find(const string &hs, const string &n, Match *m)
    [[gsl::post(lifetime(deref(M).pos, {haystack}))]];

I understand why the lifetime specification has to go at the end of the declaration in the general case -- to handle cases like this, where we want to specify a lifetime for some part of the data structure, but I'm not convinced that users should always use the most general syntax. I feel like it is going to be an adoption barrier and a readability issue.

I would love to see a simpler syntax. I think implementing this the most general way possible and adding some syntactic sugar later on after having some data about the most common patterns might make sense. Is it problematic to evolve the syntax upstream? I know this would be bad for early adopters but we could make it clear what they are opting into. 

Dmitri

-- 

main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com <mailto:gribozavr at gmail.com> >*/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20191212/118613a5/attachment-0001.html>