[cfe-dev] [RFC] Adding lifetime analysis to clang

Manuel Klimek via cfe-dev cfe-dev at lists.llvm.org
Fri Mar 1 03:13:02 PST 2019


On Fri, Mar 1, 2019 at 12:10 PM Manuel Klimek <klimek at google.com> wrote:

> -apple George + google George to prevent bounces :)
>
> On Fri, Mar 1, 2019 at 12:06 PM Manuel Klimek <klimek at google.com> wrote:
>
>> On Fri, Mar 1, 2019 at 11:51 AM Gábor Horváth <xazax.hun at gmail.com>
>> wrote:
>>
>>> Hi Manuel,
>>>
>>> On Fri, 30 Nov 2018 at 12:04, Manuel Klimek <klimek at google.com> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Nov 30, 2018 at 12:00 PM Gábor Horváth <xazax.hun at gmail.com>
>>>> wrote:
>>>>
>>>>> Unfortunately, we do not have real world experience on a large
>>>>> codebase for the following reasons:
>>>>> * The implementation we have does not support annotations yet, and
>>>>> large projects are likely to contain exceptions to the default rules (even
>>>>> if only a few)
>>>>> * The analysis has an expected coding style, e.g. it does not reason
>>>>> about arithmetic on raw pointers
>>>>>
>>>>> We did run it on some projects like LLVM and there were a lot of noise
>>>>> mainly due to the unsupported language features like pointer arithmetic.
>>>>> The rest of the false positives were mostly due to the basic
>>>>> assumption of the analysis that every path is feasible.
>>>>> We did find some true positives with a similar analysis looking for
>>>>> use after moves in LLVM. Those true positives were redundant std::forward
>>>>> calls that are not going to cause any runtime error but still removing them
>>>>> would make the code cleaner and less likely to misbehave in the future.
>>>>> See:
>>>>> https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Support/Error.h#L894
>>>>> (The same arguments forwarded multiple times in a loop. Since the callee
>>>>> never actually moves the arguments we will not end up using a moved from
>>>>> object. But if we never move the argument, why would we use std::forward in
>>>>> the first place?)
>>>>>
>>>>> But even if the cfg-based lifetime analysis end up being a coding
>>>>> style specific check, the rest of the (cfg-less) clang warnings would
>>>>> become much more powerful after type categories are upstreamed. They could
>>>>> catch a lot of errors without having any false positives by default. Thus,
>>>>> I do see the value going forward, even if we do not have much real world
>>>>> experience yet.
>>>>>
>>>>
>>>> Could you run the restricted checks on llvm?
>>>>
>>>
>>> Sorry for the delay, but finally I found some time to actually implement
>>> the CFG-less checks and run them on LLVM. The quick prototype is available
>>> here:
>>> https://github.com/mgehre/clang/commits/lifetime-warning-typecategories
>>> Literally, all of the false positives I encountered after running on
>>> LLVM and Clang were due to llvm::ValueHandleBase being miscategorized as an
>>> owner (because of its user-defined destructor).
>>> So annotating this class would render the LLVM codebase clean of these
>>> new lifetime errors. Do you think this is a strong enough evidence to start
>>> upstreaming the type categories or do you think we should do additional
>>> measurements?
>>>
>>
>> I'd like to get Richard's opinion on this.
>>
>
So just to make sure I understand correctly:
The result is that in LLVM we have:
- 1 false positive root cause (causing X false positives)
- 0 true positives
?


>
>>
>>> I also plan to look up some lifetime fixes/reverts in the commit history
>>> to check if some of those problems would have been caught by these
>>> warnings. I will share the results in a EuroLLVM technical talk.
>>>
>>> P.S.:
>>> Does anybody have a good idea how to get a list of buildbot breaking
>>> commits where the reason was lifetime related (other than grepping git
>>> history)?
>>>
>>> Regards,
>>> Gábor
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> Regards,
>>>>> Gabor
>>>>>
>>>>> On Thu, 29 Nov 2018 at 20:58, George Karpenkov <ekarpenkov at apple.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks, the idea looks great!
>>>>>>
>>>>>> The heuristic described in a talk on matching returned and accepted
>>>>>> references looked fragile to me initially,
>>>>>> but after a bit of time I could not find obvious and common
>>>>>> counterexamples which would confuse it.
>>>>>>
>>>>>> Has the evaluation been done on any large project? E.g. LLVM itself,
>>>>>> Chrome, Webkit, or any other large C++ codebase?
>>>>>> What is the false positive rate? What is the annotation burden to
>>>>>> suppress those?
>>>>>> How many useful bugs were found?
>>>>>>
>>>>>> I’m also very happy that more dataflow analyses are being implemented
>>>>>> in Clang,
>>>>>> and I could help with reviews (though obviously I’m not the code
>>>>>> owner for most of the components).
>>>>>>
>>>>>> On Nov 29, 2018, at 8:02 AM, Gábor Horváth via cfe-dev <
>>>>>> cfe-dev at lists.llvm.org> wrote:
>>>>>>
>>>>>> Hi!
>>>>>>
>>>>>> This is a proposal to implement Lifetime Analysis [1] defined by Herb
>>>>>> Sutter in Clang.
>>>>>> Summary from the paper:
>>>>>> “This analysis shows how to efficiently diagnose many common cases of
>>>>>> dangling (use-after-free) in C++ code, using only local analysis to report
>>>>>> them as deterministic readable errors at compile time. The approach is to
>>>>>> identify variables that are of generalized “Owner” types (e.g., smart
>>>>>> pointers, containers, string) and “Pointer” types (e.g., int*, string_view,
>>>>>> span, iterators, and ranges), and then use a local simple acyclic control
>>>>>> flow graph (ACFG) analysis to track what each Pointer points to and
>>>>>> identify when modifying an Owner invalidates a Pointer. The analysis
>>>>>> leverages C++’s existing strong notions of scopes, object lifetimes, and
>>>>>> const that carry rich information already available in reasonably modern
>>>>>> C++ source code. Interestingly, it appears that with minor extension this
>>>>>> analysis can also detect uses of local moved-from variables
>>>>>> (use-after-move), which are a form of dangling.”
>>>>>> More details can be found in the paper [1] or in the CppCon keynote
>>>>>> [3].
>>>>>>
>>>>>> Matthias Gehre and myself had been working on a prototype in Clang
>>>>>> [2]. The changes are rather large, so we are planning to take an
>>>>>> incremental approach to upstreaming the features should the community want
>>>>>> to see this upstream.
>>>>>>
>>>>>> *Plans for upstreaming*
>>>>>>
>>>>>> 1. Upstream Type Categorization
>>>>>>
>>>>>> Clang already performs statement-local lifetime analyses that would
>>>>>> benefit from type categorization even before adding any other analysis.
>>>>>>
>>>>>> This includes annotating types as Owners and Pointers, and
>>>>>> automatically inferring Owner or Point without annotation to minimize
>>>>>> annotation burden.
>>>>>>
>>>>>> Consider the following code example:
>>>>>>
>>>>>> std::reference_wrapper<const int> get_data() {
>>>>>>    const int i = 3;
>>>>>>    return {i};
>>>>>> }
>>>>>>
>>>>>> Unfortunately, today compilers do not warn on this case of returning
>>>>>> a dangling reference. They do warn if we return a raw pointer or reference,
>>>>>> but the compiler does not know that std::reference_wrapper also is a
>>>>>> non-owning indirection. In the Lifetime analysis, this is diagnosed because
>>>>>> std::reference_wrapper is recognized as a Pointer type.
>>>>>>
>>>>>> As a first step we would upstream the type categorization part of the
>>>>>> analysis and make some clang warnings optionally use it. We would also
>>>>>> upstream a set of annotations to give the users a way to fix potential
>>>>>> false positives due to miscategorization. (This should be very rare
>>>>>> according to our experience so far). By default, we could constrain the
>>>>>> categorization for std types, whose semantics are known.
>>>>>>
>>>>>> 2. Extensions of existing CFG-less analyses
>>>>>>
>>>>>> 2a. Initialization from temporaries
>>>>>> The goal is to detect Pointers that dangle on initialization, such as
>>>>>> std::string_view sv = “test”s;
>>>>>> By restricting the analysis to single statements, it has a low
>>>>>> false-positive rate and can be done without building a CFG (i.e. faster).
>>>>>>
>>>>>> 2b. Return of locals
>>>>>> The goal is to detect returning Pointers to local variables, e.g.
>>>>>> std::reference_wrapper<const int> get_data() {
>>>>>>    const int i = 3;
>>>>>>    return {i};
>>>>>> }
>>>>>> Similar to 2a also restricted to single statement.
>>>>>>
>>>>>> 2c. Member pointer that dangles once construction is complete
>>>>>> struct X {
>>>>>>    std::string_view sv;
>>>>>>    X() : sv("test"s) {} // warning: string_view member bound to
>>>>>> string temporary whose lifetime ends within the constructor
>>>>>> };
>>>>>>
>>>>>> 2d. New of a Pointer that dangles after the end of the full-expression
>>>>>> new string_view("test"s) // warning: dynamically-allocated
>>>>>> string_view refers to string whose lifetime ends at the end of the
>>>>>> full-expression
>>>>>>
>>>>>> 3. Intra-function analysis across basic blocks, excluding function
>>>>>> call expressions
>>>>>> Propagate point-to sets of Pointers across branches/loops
>>>>>> intra-function, e.g. analysing
>>>>>>
>>>>>> int* p = &i;
>>>>>> if(condition)
>>>>>>  p = nullptr;
>>>>>> *p; // ERROR: p is possibly null
>>>>>>
>>>>>> We have some CFG patches and some code traversing the CFG and
>>>>>> propagating the analysis state. With the type categories already in place,
>>>>>> this patch should be smaller. We could split these patches further by
>>>>>> implementing null tracking in a separate patch.
>>>>>>
>>>>>> 4. Function calls
>>>>>>
>>>>>> auto find(const string& needle, const string& haystack) ->
>>>>>> string_view [[gsl::lifetime(haystack)]];
>>>>>>
>>>>>> string_view sv = find(“needle”, haystack);
>>>>>> sv[0]; // OK
>>>>>> string_view sv = find(needle, “temporaryhaystack”);
>>>>>> sv[0]; // ERROR: sv is dangling
>>>>>>
>>>>>> This includes the following subparts.
>>>>>>
>>>>>> 4a. Precondition checks
>>>>>> Check that the psets of the arguments are valid at call site
>>>>>> according to the lifetime annotations of the callee.
>>>>>>
>>>>>> 4b. Postcondition checks
>>>>>> Check that the psets returned from a function adhere to its
>>>>>> advertised return/output psets.
>>>>>> Rigorous checking of not just the function arguments but also the
>>>>>> returned values is crucial part of the analysis.
>>>>>>
>>>>>> 4c. Lifetimes annotations
>>>>>>
>>>>>> The analysis gets pretty usable at this point. Most of the time the
>>>>>> user does not need any annotations, but it is crucial to have them before a
>>>>>> project can adapt it. For example, the user will occasionally want to
>>>>>> explicitly state that a member function is “const as far as Lifetime is
>>>>>> concerned” even though the function itself is not actually declared const
>>>>>> (e.g., vector::operator[] does not invalidate any Pointers, such as
>>>>>> iterators or raw pointers).
>>>>>>
>>>>>> 5. Implementing use after move analysis and exception support
>>>>>>
>>>>>> These parts are not implemented yet in our prototype, but they will
>>>>>> be useful additions for the analysis.
>>>>>>
>>>>>> *Questions*
>>>>>>
>>>>>> Does that make sense? What is the criteria for this work to be
>>>>>> upstreamed? Who is willing to participate in reviewing the patches?
>>>>>>
>>>>>> Thanks in advance,
>>>>>> Gabor, Matthias, and Herb
>>>>>>
>>>>>> [1]
>>>>>> https://github.com/isocpp/CppCoreGuidelines/blob/master/docs/Lifetime.pdf
>>>>>> [2] https://github.com/mgehre/clang
>>>>>> [3] https://www.youtube.com/watch?v=80BZxujhY38
>>>>>> [4] https://godbolt.org/z/90puuu
>>>>>>
>>>>>> _______________________________________________
>>>>>> cfe-dev mailing list
>>>>>> cfe-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>
>>>>>>
>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190301/b766c09c/attachment.html>


More information about the cfe-dev mailing list