[cfe-dev] [RFC] Adding lifetime analysis to clang

Manuel Klimek via cfe-dev cfe-dev at lists.llvm.org
Fri Nov 30 03:04:43 PST 2018


On Fri, Nov 30, 2018 at 12:00 PM Gábor Horváth <xazax.hun at gmail.com> wrote:

> Unfortunately, we do not have real world experience on a large codebase
> for the following reasons:
> * The implementation we have does not support annotations yet, and large
> projects are likely to contain exceptions to the default rules (even if
> only a few)
> * The analysis has an expected coding style, e.g. it does not reason about
> arithmetic on raw pointers
>
> We did run it on some projects like LLVM and there were a lot of noise
> mainly due to the unsupported language features like pointer arithmetic.
> The rest of the false positives were mostly due to the basic assumption of
> the analysis that every path is feasible.
> We did find some true positives with a similar analysis looking for use
> after moves in LLVM. Those true positives were redundant std::forward calls
> that are not going to cause any runtime error but still removing them would
> make the code cleaner and less likely to misbehave in the future. See:
> https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Support/Error.h#L894
> (The same arguments forwarded multiple times in a loop. Since the callee
> never actually moves the arguments we will not end up using a moved from
> object. But if we never move the argument, why would we use std::forward in
> the first place?)
>
> But even if the cfg-based lifetime analysis end up being a coding style
> specific check, the rest of the (cfg-less) clang warnings would become much
> more powerful after type categories are upstreamed. They could catch a lot
> of errors without having any false positives by default. Thus, I do see the
> value going forward, even if we do not have much real world experience yet.
>

Could you run the restricted checks on llvm?


>
> Regards,
> Gabor
>
> On Thu, 29 Nov 2018 at 20:58, George Karpenkov <ekarpenkov at apple.com>
> wrote:
>
>> Thanks, the idea looks great!
>>
>> The heuristic described in a talk on matching returned and accepted
>> references looked fragile to me initially,
>> but after a bit of time I could not find obvious and common
>> counterexamples which would confuse it.
>>
>> Has the evaluation been done on any large project? E.g. LLVM itself,
>> Chrome, Webkit, or any other large C++ codebase?
>> What is the false positive rate? What is the annotation burden to
>> suppress those?
>> How many useful bugs were found?
>>
>> I’m also very happy that more dataflow analyses are being implemented in
>> Clang,
>> and I could help with reviews (though obviously I’m not the code owner
>> for most of the components).
>>
>> On Nov 29, 2018, at 8:02 AM, Gábor Horváth via cfe-dev <
>> cfe-dev at lists.llvm.org> wrote:
>>
>> Hi!
>>
>> This is a proposal to implement Lifetime Analysis [1] defined by Herb
>> Sutter in Clang.
>> Summary from the paper:
>> “This analysis shows how to efficiently diagnose many common cases of
>> dangling (use-after-free) in C++ code, using only local analysis to report
>> them as deterministic readable errors at compile time. The approach is to
>> identify variables that are of generalized “Owner” types (e.g., smart
>> pointers, containers, string) and “Pointer” types (e.g., int*, string_view,
>> span, iterators, and ranges), and then use a local simple acyclic control
>> flow graph (ACFG) analysis to track what each Pointer points to and
>> identify when modifying an Owner invalidates a Pointer. The analysis
>> leverages C++’s existing strong notions of scopes, object lifetimes, and
>> const that carry rich information already available in reasonably modern
>> C++ source code. Interestingly, it appears that with minor extension this
>> analysis can also detect uses of local moved-from variables
>> (use-after-move), which are a form of dangling.”
>> More details can be found in the paper [1] or in the CppCon keynote [3].
>>
>> Matthias Gehre and myself had been working on a prototype in Clang [2].
>> The changes are rather large, so we are planning to take an incremental
>> approach to upstreaming the features should the community want to see this
>> upstream.
>>
>> *Plans for upstreaming*
>>
>> 1. Upstream Type Categorization
>>
>> Clang already performs statement-local lifetime analyses that would
>> benefit from type categorization even before adding any other analysis.
>>
>> This includes annotating types as Owners and Pointers, and automatically
>> inferring Owner or Point without annotation to minimize annotation burden.
>>
>> Consider the following code example:
>>
>> std::reference_wrapper<const int> get_data() {
>>    const int i = 3;
>>    return {i};
>> }
>>
>> Unfortunately, today compilers do not warn on this case of returning a
>> dangling reference. They do warn if we return a raw pointer or reference,
>> but the compiler does not know that std::reference_wrapper also is a
>> non-owning indirection. In the Lifetime analysis, this is diagnosed because
>> std::reference_wrapper is recognized as a Pointer type.
>>
>> As a first step we would upstream the type categorization part of the
>> analysis and make some clang warnings optionally use it. We would also
>> upstream a set of annotations to give the users a way to fix potential
>> false positives due to miscategorization. (This should be very rare
>> according to our experience so far). By default, we could constrain the
>> categorization for std types, whose semantics are known.
>>
>> 2. Extensions of existing CFG-less analyses
>>
>> 2a. Initialization from temporaries
>> The goal is to detect Pointers that dangle on initialization, such as
>> std::string_view sv = “test”s;
>> By restricting the analysis to single statements, it has a low
>> false-positive rate and can be done without building a CFG (i.e. faster).
>>
>> 2b. Return of locals
>> The goal is to detect returning Pointers to local variables, e.g.
>> std::reference_wrapper<const int> get_data() {
>>    const int i = 3;
>>    return {i};
>> }
>> Similar to 2a also restricted to single statement.
>>
>> 2c. Member pointer that dangles once construction is complete
>> struct X {
>>    std::string_view sv;
>>    X() : sv("test"s) {} // warning: string_view member bound to string
>> temporary whose lifetime ends within the constructor
>> };
>>
>> 2d. New of a Pointer that dangles after the end of the full-expression
>> new string_view("test"s) // warning: dynamically-allocated string_view
>> refers to string whose lifetime ends at the end of the full-expression
>>
>> 3. Intra-function analysis across basic blocks, excluding function call
>> expressions
>> Propagate point-to sets of Pointers across branches/loops intra-function,
>> e.g. analysing
>>
>> int* p = &i;
>> if(condition)
>>  p = nullptr;
>> *p; // ERROR: p is possibly null
>>
>> We have some CFG patches and some code traversing the CFG and propagating
>> the analysis state. With the type categories already in place, this patch
>> should be smaller. We could split these patches further by implementing
>> null tracking in a separate patch.
>>
>> 4. Function calls
>>
>> auto find(const string& needle, const string& haystack) -> string_view
>> [[gsl::lifetime(haystack)]];
>>
>> string_view sv = find(“needle”, haystack);
>> sv[0]; // OK
>> string_view sv = find(needle, “temporaryhaystack”);
>> sv[0]; // ERROR: sv is dangling
>>
>> This includes the following subparts.
>>
>> 4a. Precondition checks
>> Check that the psets of the arguments are valid at call site according to
>> the lifetime annotations of the callee.
>>
>> 4b. Postcondition checks
>> Check that the psets returned from a function adhere to its advertised
>> return/output psets.
>> Rigorous checking of not just the function arguments but also the
>> returned values is crucial part of the analysis.
>>
>> 4c. Lifetimes annotations
>>
>> The analysis gets pretty usable at this point. Most of the time the user
>> does not need any annotations, but it is crucial to have them before a
>> project can adapt it. For example, the user will occasionally want to
>> explicitly state that a member function is “const as far as Lifetime is
>> concerned” even though the function itself is not actually declared const
>> (e.g., vector::operator[] does not invalidate any Pointers, such as
>> iterators or raw pointers).
>>
>> 5. Implementing use after move analysis and exception support
>>
>> These parts are not implemented yet in our prototype, but they will be
>> useful additions for the analysis.
>>
>> *Questions*
>>
>> Does that make sense? What is the criteria for this work to be
>> upstreamed? Who is willing to participate in reviewing the patches?
>>
>> Thanks in advance,
>> Gabor, Matthias, and Herb
>>
>> [1]
>> https://github.com/isocpp/CppCoreGuidelines/blob/master/docs/Lifetime.pdf
>> [2] https://github.com/mgehre/clang
>> [3] https://www.youtube.com/watch?v=80BZxujhY38
>> [4] https://godbolt.org/z/90puuu
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20181130/7588d546/attachment.html>


More information about the cfe-dev mailing list