[cfe-dev] [RFC] Adding lifetime analysis to clang

Gábor Horváth via cfe-dev cfe-dev at lists.llvm.org
Mon Dec 3 07:30:48 PST 2018

On Fri, 30 Nov 2018 at 12:04, Manuel Klimek <klimek at google.com> wrote:

> On Fri, Nov 30, 2018 at 12:00 PM Gábor Horváth <xazax.hun at gmail.com>
> wrote:
>> Unfortunately, we do not have real world experience on a large codebase
>> for the following reasons:
>> * The implementation we have does not support annotations yet, and large
>> projects are likely to contain exceptions to the default rules (even if
>> only a few)
>> * The analysis has an expected coding style, e.g. it does not reason
>> about arithmetic on raw pointers
>> We did run it on some projects like LLVM and there were a lot of noise
>> mainly due to the unsupported language features like pointer arithmetic.
>> The rest of the false positives were mostly due to the basic assumption
>> of the analysis that every path is feasible.
>> We did find some true positives with a similar analysis looking for use
>> after moves in LLVM. Those true positives were redundant std::forward calls
>> that are not going to cause any runtime error but still removing them would
>> make the code cleaner and less likely to misbehave in the future. See:
>> https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Support/Error.h#L894
>> (The same arguments forwarded multiple times in a loop. Since the callee
>> never actually moves the arguments we will not end up using a moved from
>> object. But if we never move the argument, why would we use std::forward in
>> the first place?)
>> But even if the cfg-based lifetime analysis end up being a coding style
>> specific check, the rest of the (cfg-less) clang warnings would become much
>> more powerful after type categories are upstreamed. They could catch a lot
>> of errors without having any false positives by default. Thus, I do see the
>> value going forward, even if we do not have much real world experience yet.
> Could you run the restricted checks on llvm?

This makes perfect sense. We will report back with the results.

>> Regards,
>> Gabor
>> On Thu, 29 Nov 2018 at 20:58, George Karpenkov <ekarpenkov at apple.com>
>> wrote:
>>> Thanks, the idea looks great!
>>> The heuristic described in a talk on matching returned and accepted
>>> references looked fragile to me initially,
>>> but after a bit of time I could not find obvious and common
>>> counterexamples which would confuse it.
>>> Has the evaluation been done on any large project? E.g. LLVM itself,
>>> Chrome, Webkit, or any other large C++ codebase?
>>> What is the false positive rate? What is the annotation burden to
>>> suppress those?
>>> How many useful bugs were found?
>>> I’m also very happy that more dataflow analyses are being implemented in
>>> Clang,
>>> and I could help with reviews (though obviously I’m not the code owner
>>> for most of the components).
>>> On Nov 29, 2018, at 8:02 AM, Gábor Horváth via cfe-dev <
>>> cfe-dev at lists.llvm.org> wrote:
>>> Hi!
>>> This is a proposal to implement Lifetime Analysis [1] defined by Herb
>>> Sutter in Clang.
>>> Summary from the paper:
>>> “This analysis shows how to efficiently diagnose many common cases of
>>> dangling (use-after-free) in C++ code, using only local analysis to report
>>> them as deterministic readable errors at compile time. The approach is to
>>> identify variables that are of generalized “Owner” types (e.g., smart
>>> pointers, containers, string) and “Pointer” types (e.g., int*, string_view,
>>> span, iterators, and ranges), and then use a local simple acyclic control
>>> flow graph (ACFG) analysis to track what each Pointer points to and
>>> identify when modifying an Owner invalidates a Pointer. The analysis
>>> leverages C++’s existing strong notions of scopes, object lifetimes, and
>>> const that carry rich information already available in reasonably modern
>>> C++ source code. Interestingly, it appears that with minor extension this
>>> analysis can also detect uses of local moved-from variables
>>> (use-after-move), which are a form of dangling.”
>>> More details can be found in the paper [1] or in the CppCon keynote [3].
>>> Matthias Gehre and myself had been working on a prototype in Clang [2].
>>> The changes are rather large, so we are planning to take an incremental
>>> approach to upstreaming the features should the community want to see this
>>> upstream.
>>> *Plans for upstreaming*
>>> 1. Upstream Type Categorization
>>> Clang already performs statement-local lifetime analyses that would
>>> benefit from type categorization even before adding any other analysis.
>>> This includes annotating types as Owners and Pointers, and automatically
>>> inferring Owner or Point without annotation to minimize annotation burden.
>>> Consider the following code example:
>>> std::reference_wrapper<const int> get_data() {
>>>    const int i = 3;
>>>    return {i};
>>> }
>>> Unfortunately, today compilers do not warn on this case of returning a
>>> dangling reference. They do warn if we return a raw pointer or reference,
>>> but the compiler does not know that std::reference_wrapper also is a
>>> non-owning indirection. In the Lifetime analysis, this is diagnosed because
>>> std::reference_wrapper is recognized as a Pointer type.
>>> As a first step we would upstream the type categorization part of the
>>> analysis and make some clang warnings optionally use it. We would also
>>> upstream a set of annotations to give the users a way to fix potential
>>> false positives due to miscategorization. (This should be very rare
>>> according to our experience so far). By default, we could constrain the
>>> categorization for std types, whose semantics are known.
>>> 2. Extensions of existing CFG-less analyses
>>> 2a. Initialization from temporaries
>>> The goal is to detect Pointers that dangle on initialization, such as
>>> std::string_view sv = “test”s;
>>> By restricting the analysis to single statements, it has a low
>>> false-positive rate and can be done without building a CFG (i.e. faster).
>>> 2b. Return of locals
>>> The goal is to detect returning Pointers to local variables, e.g.
>>> std::reference_wrapper<const int> get_data() {
>>>    const int i = 3;
>>>    return {i};
>>> }
>>> Similar to 2a also restricted to single statement.
>>> 2c. Member pointer that dangles once construction is complete
>>> struct X {
>>>    std::string_view sv;
>>>    X() : sv("test"s) {} // warning: string_view member bound to string
>>> temporary whose lifetime ends within the constructor
>>> };
>>> 2d. New of a Pointer that dangles after the end of the full-expression
>>> new string_view("test"s) // warning: dynamically-allocated string_view
>>> refers to string whose lifetime ends at the end of the full-expression
>>> 3. Intra-function analysis across basic blocks, excluding function call
>>> expressions
>>> Propagate point-to sets of Pointers across branches/loops
>>> intra-function, e.g. analysing
>>> int* p = &i;
>>> if(condition)
>>>  p = nullptr;
>>> *p; // ERROR: p is possibly null
>>> We have some CFG patches and some code traversing the CFG and
>>> propagating the analysis state. With the type categories already in place,
>>> this patch should be smaller. We could split these patches further by
>>> implementing null tracking in a separate patch.
>>> 4. Function calls
>>> auto find(const string& needle, const string& haystack) -> string_view
>>> [[gsl::lifetime(haystack)]];
>>> string_view sv = find(“needle”, haystack);
>>> sv[0]; // OK
>>> string_view sv = find(needle, “temporaryhaystack”);
>>> sv[0]; // ERROR: sv is dangling
>>> This includes the following subparts.
>>> 4a. Precondition checks
>>> Check that the psets of the arguments are valid at call site according
>>> to the lifetime annotations of the callee.
>>> 4b. Postcondition checks
>>> Check that the psets returned from a function adhere to its advertised
>>> return/output psets.
>>> Rigorous checking of not just the function arguments but also the
>>> returned values is crucial part of the analysis.
>>> 4c. Lifetimes annotations
>>> The analysis gets pretty usable at this point. Most of the time the user
>>> does not need any annotations, but it is crucial to have them before a
>>> project can adapt it. For example, the user will occasionally want to
>>> explicitly state that a member function is “const as far as Lifetime is
>>> concerned” even though the function itself is not actually declared const
>>> (e.g., vector::operator[] does not invalidate any Pointers, such as
>>> iterators or raw pointers).
>>> 5. Implementing use after move analysis and exception support
>>> These parts are not implemented yet in our prototype, but they will be
>>> useful additions for the analysis.
>>> *Questions*
>>> Does that make sense? What is the criteria for this work to be
>>> upstreamed? Who is willing to participate in reviewing the patches?
>>> Thanks in advance,
>>> Gabor, Matthias, and Herb
>>> [1]
>>> https://github.com/isocpp/CppCoreGuidelines/blob/master/docs/Lifetime.pdf
>>> [2] https://github.com/mgehre/clang
>>> [3] https://www.youtube.com/watch?v=80BZxujhY38
>>> [4] https://godbolt.org/z/90puuu
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20181203/0b129387/attachment.html>

More information about the cfe-dev mailing list