[cfe-dev] [RFC] Adding lifetime analysis to clang

Gábor Horváth via cfe-dev cfe-dev at lists.llvm.org
Fri Aug 23 08:25:29 PDT 2019


Hi Nico!

Currently, the statement-local warnings have a very limited scope. If you
create a Pointer from an Owner using a conversion operator or a constructor
it will assume that the Pointer points 'into' the owner. To make these
warnings more useful we hardcoded the semantics of some methods in the STL.
So in case you have another type `Bar` that can be created from `Foo` and
`Bar` will point to something owned by `Foo` it make a lot of sense to
annotate both classes. Otherwise, it has relatively little use right now.

Improving the situation, however, is not out of scope. We plan to introduce
function annotations as well. With those function annotations you can
annotate which methods will return a pointer to an owned entity. This way
having multiple pointees will not be a problem.

One thing, however, that is out of the scope, is more complex ownership
models. For example, when a function can return both owned and non-owned
pointers depending on their arguments. We do not scan to support that
scenario, and simply not annotating such functions should avoid the false
positives.

So to summarize, currently the most value of these warnings are coming from
using STL types and user defined conversions. But we do plan to extend
these to be able to catch additional problems but that will require the
users to annotate functions/methods. We also plan to suggest/infer those
annotations automatically, but his feature will certainly be off by default
and might be provided by a separate tool, like a clang-tidy check (or just
a separate warning flag).

Cheers,
Gabor



On Fri, 23 Aug 2019 at 07:58, Nico Weber <thakis at chromium.org> wrote:

> Thanks for the update!
>
> I looked at this a bit now that it exists, and I have a question about the
> Pointer/Owner attributes: It looks like they go on a class to mark the
> class as an owner or a pointer – but the examples I've seen so far are all
> for classes that have a single member variable.
>
> If I have a class with multiple member variables, some of them weak,
> others strong, can I use these attributes? Say I have something like
>
> class Foo {
>   std::unique_ptr<Bar> b;
>   C* c;  // Owned by b, or something else entirely
>   D* d; // Also weak
>   E* e; // Happens to be owned by Foo, but not yet a unique_ptr
> };
>
> Since unique_ptr is Owner, `b` should be fine, but Foo is a Pointer to c
> and d and an owner of E. Can I express this with the current attributes? Or
> is this intentionally out of scope?
>
> Thanks,
> Nico
>
> On Wed, Aug 21, 2019 at 12:35 PM Gábor Horváth via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> Hi!
>>
>> I wanted to give you a small update on the process of upstreaming these
>> changes. The type category annotations along with most of the statement
>> local warnings are already upstreamed. Clang will not guess the type
>> categories automatically, types need to be annotated explicitly. For some
>> STL types, Clang will automatically append the annotations even if they are
>> not present in the source code. This way the new warnings will trigger even
>> if the STL implementation does not have the annotations in place yet.
>>
>> Furthermore, it looks like these warnings really do catch bugs! :) See
>> the two examples we found in the LLVM repository by annotationg
>> llvm::StringRef [1]. We also found bugs in other popular open source
>> projects most of which was fixed very soon after reporting them. We did not
>> see any false positives when running current Clang top of tree on other
>> projects, but as these warnings are relatively new it is not impossible to
>> have some. The Chromium and LLVM builds are clean though, so there should
>> not be any obvious problem. If you notice any spurious warnings please let
>> us know!
>>
>> We plan to share a more detailed evaluation/report after the CppCon talk
>> on the subject [2].
>>
>> Cheers,
>> Gabor
>>
>> [1]: https://reviews.llvm.org/D66443 , https://reviews.llvm.org/D66442 ,
>> https://reviews.llvm.org/D66440
>> [2]: https://sched.co/Sfrc
>>
>>
>> On Thu, 29 Nov 2018 at 08:02, Gábor Horváth <xazax.hun at gmail.com> wrote:
>>
>>> Hi!
>>>
>>> This is a proposal to implement Lifetime Analysis [1] defined by Herb
>>> Sutter in Clang.
>>>
>>> Summary from the paper:
>>>
>>> “This analysis shows how to efficiently diagnose many common cases of
>>> dangling (use-after-free) in C++ code, using only local analysis to report
>>> them as deterministic readable errors at compile time. The approach is to
>>> identify variables that are of generalized “Owner” types (e.g., smart
>>> pointers, containers, string) and “Pointer” types (e.g., int*, string_view,
>>> span, iterators, and ranges), and then use a local simple acyclic control
>>> flow graph (ACFG) analysis to track what each Pointer points to and
>>> identify when modifying an Owner invalidates a Pointer. The analysis
>>> leverages C++’s existing strong notions of scopes, object lifetimes, and
>>> const that carry rich information already available in reasonably modern
>>> C++ source code. Interestingly, it appears that with minor extension this
>>> analysis can also detect uses of local moved-from variables
>>> (use-after-move), which are a form of dangling.”
>>>
>>> More details can be found in the paper [1] or in the CppCon keynote [3].
>>>
>>> Matthias Gehre and myself had been working on a prototype in Clang [2].
>>> The changes are rather large, so we are planning to take an incremental
>>> approach to upstreaming the features should the community want to see this
>>> upstream.
>>>
>>> *Plans for upstreaming*
>>>
>>> 1. Upstream Type Categorization
>>>
>>> Clang already performs statement-local lifetime analyses that would
>>> benefit from type categorization even before adding any other analysis.
>>>
>>> This includes annotating types as Owners and Pointers, and automatically
>>> inferring Owner or Point without annotation to minimize annotation burden.
>>>
>>> Consider the following code example:
>>>
>>> std::reference_wrapper<const int> get_data() {
>>>
>>>    const int i = 3;
>>>
>>>    return {i};
>>>
>>> }
>>>
>>> Unfortunately, today compilers do not warn on this case of returning a
>>> dangling reference. They do warn if we return a raw pointer or reference,
>>> but the compiler does not know that std::reference_wrapper also is a
>>> non-owning indirection. In the Lifetime analysis, this is diagnosed because
>>> std::reference_wrapper is recognized as a Pointer type.
>>>
>>> As a first step we would upstream the type categorization part of the
>>> analysis and make some clang warnings optionally use it. We would also
>>> upstream a set of annotations to give the users a way to fix potential
>>> false positives due to miscategorization. (This should be very rare
>>> according to our experience so far). By default, we could constrain the
>>> categorization for std types, whose semantics are known.
>>>
>>> 2. Extensions of existing CFG-less analyses
>>>
>>> 2a. Initialization from temporaries
>>>
>>> The goal is to detect Pointers that dangle on initialization, such as
>>>
>>> std::string_view sv = “test”s;
>>>
>>> By restricting the analysis to single statements, it has a low
>>> false-positive rate and can be done without building a CFG (i.e. faster).
>>>
>>> 2b. Return of locals
>>>
>>> The goal is to detect returning Pointers to local variables, e.g.
>>>
>>> std::reference_wrapper<const int> get_data() {
>>>
>>>    const int i = 3;
>>>
>>>    return {i};
>>>
>>> }
>>>
>>> Similar to 2a also restricted to single statement.
>>>
>>> 2c. Member pointer that dangles once construction is complete
>>>
>>> struct X {
>>>
>>>    std::string_view sv;
>>>
>>>    X() : sv("test"s) {} // warning: string_view member bound to string
>>> temporary whose lifetime ends within the constructor
>>>
>>> };
>>>
>>> 2d. New of a Pointer that dangles after the end of the full-expression
>>>
>>> new string_view("test"s) // warning: dynamically-allocated string_view
>>> refers to string whose lifetime ends at the end of the full-expression
>>>
>>> 3. Intra-function analysis across basic blocks, excluding function call
>>> expressions
>>>
>>> Propagate point-to sets of Pointers across branches/loops
>>> intra-function, e.g. analysing
>>>
>>> int* p = &i;
>>>
>>> if(condition)
>>>
>>>  p = nullptr;
>>>
>>> *p; // ERROR: p is possibly null
>>>
>>> We have some CFG patches and some code traversing the CFG and
>>> propagating the analysis state. With the type categories already in place,
>>> this patch should be smaller. We could split these patches further by
>>> implementing null tracking in a separate patch.
>>>
>>> 4. Function calls
>>>
>>> auto find(const string& needle, const string& haystack) -> string_view
>>> [[gsl::lifetime(haystack)]];
>>>
>>> string_view sv = find(“needle”, haystack);
>>>
>>> sv[0]; // OK
>>>
>>> string_view sv = find(needle, “temporaryhaystack”);
>>>
>>> sv[0]; // ERROR: sv is dangling
>>>
>>> This includes the following subparts.
>>>
>>> 4a. Precondition checks
>>>
>>> Check that the psets of the arguments are valid at call site according
>>> to the lifetime annotations of the callee.
>>>
>>> 4b. Postcondition checks
>>>
>>> Check that the psets returned from a function adhere to its advertised
>>> return/output psets.
>>>
>>> Rigorous checking of not just the function arguments but also the
>>> returned values is crucial part of the analysis.
>>>
>>> 4c. Lifetimes annotations
>>>
>>> The analysis gets pretty usable at this point. Most of the time the user
>>> does not need any annotations, but it is crucial to have them before a
>>> project can adapt it. For example, the user will occasionally want to
>>> explicitly state that a member function is “const as far as Lifetime is
>>> concerned” even though the function itself is not actually declared const
>>> (e.g., vector::operator[] does not invalidate any Pointers, such as
>>> iterators or raw pointers).
>>>
>>> 5. Implementing use after move analysis and exception support
>>>
>>> These parts are not implemented yet in our prototype, but they will be
>>> useful additions for the analysis.
>>>
>>> *Questions*
>>>
>>> Does that make sense? What is the criteria for this work to be
>>> upstreamed? Who is willing to participate in reviewing the patches?
>>>
>>> Thanks in advance,
>>>
>>> Gabor, Matthias, and Herb
>>>
>>> [1]
>>> https://github.com/isocpp/CppCoreGuidelines/blob/master/docs/Lifetime.pdf
>>>
>>> [2] https://github.com/mgehre/clang
>>>
>>> [3] https://www.youtube.com/watch?v=80BZxujhY38
>>>
>>> [4] https://godbolt.org/z/90puuu
>>>
>>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190823/79658e71/attachment.html>


More information about the cfe-dev mailing list