[cfe-dev] [RFC] Adding lifetime analysis to clang

Fri Mar 1 03:54:44 PST 2019

(correcting dmitri's email address)

On Fri, Mar 1, 2019 at 12:24 PM Gábor Horváth <xazax.hun at gmail.com> wrote:

>
>
> On Fri, 1 Mar 2019 at 12:13, Manuel Klimek <klimek at google.com> wrote:
>
>> On Fri, Mar 1, 2019 at 12:10 PM Manuel Klimek <klimek at google.com> wrote:
>>
>>> -apple George + google George to prevent bounces :)
>>>
>>> On Fri, Mar 1, 2019 at 12:06 PM Manuel Klimek <klimek at google.com> wrote:
>>>
>>>> On Fri, Mar 1, 2019 at 11:51 AM Gábor Horváth <xazax.hun at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Manuel,
>>>>>
>>>>> On Fri, 30 Nov 2018 at 12:04, Manuel Klimek <klimek at google.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Nov 30, 2018 at 12:00 PM Gábor Horváth <xazax.hun at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Unfortunately, we do not have real world experience on a large
>>>>>>> codebase for the following reasons:
>>>>>>> * The implementation we have does not support annotations yet, and
>>>>>>> large projects are likely to contain exceptions to the default rules (even
>>>>>>> if only a few)
>>>>>>> * The analysis has an expected coding style, e.g. it does not reason
>>>>>>> about arithmetic on raw pointers
>>>>>>>
>>>>>>> We did run it on some projects like LLVM and there were a lot of
>>>>>>> noise mainly due to the unsupported language features like pointer
>>>>>>> arithmetic.
>>>>>>> The rest of the false positives were mostly due to the basic
>>>>>>> assumption of the analysis that every path is feasible.
>>>>>>> We did find some true positives with a similar analysis looking for
>>>>>>> use after moves in LLVM. Those true positives were redundant std::forward
>>>>>>> calls that are not going to cause any runtime error but still removing them
>>>>>>> would make the code cleaner and less likely to misbehave in the future.
>>>>>>> See:
>>>>>>> https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Support/Error.h#L894
>>>>>>> (The same arguments forwarded multiple times in a loop. Since the callee
>>>>>>> never actually moves the arguments we will not end up using a moved from
>>>>>>> object. But if we never move the argument, why would we use std::forward in
>>>>>>> the first place?)
>>>>>>>
>>>>>>> But even if the cfg-based lifetime analysis end up being a coding
>>>>>>> style specific check, the rest of the (cfg-less) clang warnings would
>>>>>>> become much more powerful after type categories are upstreamed. They could
>>>>>>> catch a lot of errors without having any false positives by default. Thus,
>>>>>>> I do see the value going forward, even if we do not have much real world
>>>>>>> experience yet.
>>>>>>>
>>>>>>
>>>>>> Could you run the restricted checks on llvm?
>>>>>>
>>>>>
>>>>> Sorry for the delay, but finally I found some time to actually
>>>>> implement the CFG-less checks and run them on LLVM. The quick prototype is
>>>>> available here:
>>>>> https://github.com/mgehre/clang/commits/lifetime-warning-typecategories
>>>>> Literally, all of the false positives I encountered after running on
>>>>> LLVM and Clang were due to llvm::ValueHandleBase being miscategorized as an
>>>>> owner (because of its user-defined destructor).
>>>>> So annotating this class would render the LLVM codebase clean of these
>>>>> new lifetime errors. Do you think this is a strong enough evidence to start
>>>>> upstreaming the type categories or do you think we should do additional
>>>>> measurements?
>>>>>
>>>>
>>>> I'd like to get Richard's opinion on this.
>>>>
>>>
>> So just to make sure I understand correctly:
>> The result is that in LLVM we have:
>> - 1 false positive root cause (causing X false positives)
>> - 0 true positives
>> ?
>>
>>
>
> Correct.
>

Unfortunately that doesn't sound like a really strong argument in favor -
if you have something that works cleanly against clang trunk I can try to
find somebody willing to try it on a bit of our internal codebase, but I
can't promise anything.

Basically what I'd be looking for is some evidence that the false positive
ratio is not too high, and for that we'd need at least a codebase
containing true positives.

>
>
>>
>>>>
>>>>> I also plan to look up some lifetime fixes/reverts in the commit
>>>>> history to check if some of those problems would have been caught by these
>>>>> warnings. I will share the results in a EuroLLVM technical talk.
>>>>>
>>>>> P.S.:
>>>>> Does anybody have a good idea how to get a list of buildbot breaking
>>>>> commits where the reason was lifetime related (other than grepping git
>>>>> history)?
>>>>>
>>>>> Regards,
>>>>> Gábor
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Gabor
>>>>>>>
>>>>>>> On Thu, 29 Nov 2018 at 20:58, George Karpenkov <ekarpenkov at apple.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks, the idea looks great!
>>>>>>>>
>>>>>>>> The heuristic described in a talk on matching returned and accepted
>>>>>>>> references looked fragile to me initially,
>>>>>>>> but after a bit of time I could not find obvious and common
>>>>>>>> counterexamples which would confuse it.
>>>>>>>>
>>>>>>>> Has the evaluation been done on any large project? E.g. LLVM
>>>>>>>> itself, Chrome, Webkit, or any other large C++ codebase?
>>>>>>>> What is the false positive rate? What is the annotation burden to
>>>>>>>> suppress those?
>>>>>>>> How many useful bugs were found?
>>>>>>>>
>>>>>>>> I’m also very happy that more dataflow analyses are being
>>>>>>>> implemented in Clang,
>>>>>>>> and I could help with reviews (though obviously I’m not the code
>>>>>>>> owner for most of the components).
>>>>>>>>
>>>>>>>> On Nov 29, 2018, at 8:02 AM, Gábor Horváth via cfe-dev <
>>>>>>>> cfe-dev at lists.llvm.org> wrote:
>>>>>>>>
>>>>>>>> Hi!
>>>>>>>>
>>>>>>>> This is a proposal to implement Lifetime Analysis [1] defined by
>>>>>>>> Herb Sutter in Clang.
>>>>>>>> Summary from the paper:
>>>>>>>> “This analysis shows how to efficiently diagnose many common cases
>>>>>>>> of dangling (use-after-free) in C++ code, using only local analysis to
>>>>>>>> report them as deterministic readable errors at compile time. The approach
>>>>>>>> is to identify variables that are of generalized “Owner” types (e.g., smart
>>>>>>>> pointers, containers, string) and “Pointer” types (e.g., int*, string_view,
>>>>>>>> span, iterators, and ranges), and then use a local simple acyclic control
>>>>>>>> flow graph (ACFG) analysis to track what each Pointer points to and
>>>>>>>> identify when modifying an Owner invalidates a Pointer. The analysis
>>>>>>>> leverages C++’s existing strong notions of scopes, object lifetimes, and
>>>>>>>> const that carry rich information already available in reasonably modern
>>>>>>>> C++ source code. Interestingly, it appears that with minor extension this
>>>>>>>> analysis can also detect uses of local moved-from variables
>>>>>>>> (use-after-move), which are a form of dangling.”
>>>>>>>> More details can be found in the paper [1] or in the CppCon keynote
>>>>>>>> [3].
>>>>>>>>
>>>>>>>> Matthias Gehre and myself had been working on a prototype in Clang
>>>>>>>> [2]. The changes are rather large, so we are planning to take an
>>>>>>>> incremental approach to upstreaming the features should the community want
>>>>>>>> to see this upstream.
>>>>>>>>
>>>>>>>> *Plans for upstreaming*
>>>>>>>>
>>>>>>>> 1. Upstream Type Categorization
>>>>>>>>
>>>>>>>> Clang already performs statement-local lifetime analyses that would
>>>>>>>> benefit from type categorization even before adding any other analysis.
>>>>>>>>
>>>>>>>> This includes annotating types as Owners and Pointers, and
>>>>>>>> automatically inferring Owner or Point without annotation to minimize
>>>>>>>> annotation burden.
>>>>>>>>
>>>>>>>> Consider the following code example:
>>>>>>>>
>>>>>>>> std::reference_wrapper<const int> get_data() {
>>>>>>>>    const int i = 3;
>>>>>>>>    return {i};
>>>>>>>> }
>>>>>>>>
>>>>>>>> Unfortunately, today compilers do not warn on this case of
>>>>>>>> returning a dangling reference. They do warn if we return a raw pointer or
>>>>>>>> reference, but the compiler does not know that std::reference_wrapper also
>>>>>>>> is a non-owning indirection. In the Lifetime analysis, this is diagnosed
>>>>>>>> because std::reference_wrapper is recognized as a Pointer type.
>>>>>>>>
>>>>>>>> As a first step we would upstream the type categorization part of
>>>>>>>> the analysis and make some clang warnings optionally use it. We would also
>>>>>>>> upstream a set of annotations to give the users a way to fix potential
>>>>>>>> false positives due to miscategorization. (This should be very rare
>>>>>>>> according to our experience so far). By default, we could constrain the
>>>>>>>> categorization for std types, whose semantics are known.
>>>>>>>>
>>>>>>>> 2. Extensions of existing CFG-less analyses
>>>>>>>>
>>>>>>>> 2a. Initialization from temporaries
>>>>>>>> The goal is to detect Pointers that dangle on initialization, such
>>>>>>>> as
>>>>>>>> std::string_view sv = “test”s;
>>>>>>>> By restricting the analysis to single statements, it has a low
>>>>>>>> false-positive rate and can be done without building a CFG (i.e. faster).
>>>>>>>>
>>>>>>>> 2b. Return of locals
>>>>>>>> The goal is to detect returning Pointers to local variables, e.g.
>>>>>>>> std::reference_wrapper<const int> get_data() {
>>>>>>>>    const int i = 3;
>>>>>>>>    return {i};
>>>>>>>> }
>>>>>>>> Similar to 2a also restricted to single statement.
>>>>>>>>
>>>>>>>> 2c. Member pointer that dangles once construction is complete
>>>>>>>> struct X {
>>>>>>>>    std::string_view sv;
>>>>>>>>    X() : sv("test"s) {} // warning: string_view member bound to
>>>>>>>> string temporary whose lifetime ends within the constructor
>>>>>>>> };
>>>>>>>>
>>>>>>>> 2d. New of a Pointer that dangles after the end of the
>>>>>>>> full-expression
>>>>>>>> new string_view("test"s) // warning: dynamically-allocated
>>>>>>>> string_view refers to string whose lifetime ends at the end of the
>>>>>>>> full-expression
>>>>>>>>
>>>>>>>> 3. Intra-function analysis across basic blocks, excluding function
>>>>>>>> call expressions
>>>>>>>> Propagate point-to sets of Pointers across branches/loops
>>>>>>>> intra-function, e.g. analysing
>>>>>>>>
>>>>>>>> int* p = &i;
>>>>>>>> if(condition)
>>>>>>>>  p = nullptr;
>>>>>>>> *p; // ERROR: p is possibly null
>>>>>>>>
>>>>>>>> We have some CFG patches and some code traversing the CFG and
>>>>>>>> propagating the analysis state. With the type categories already in place,
>>>>>>>> this patch should be smaller. We could split these patches further by
>>>>>>>> implementing null tracking in a separate patch.
>>>>>>>>
>>>>>>>> 4. Function calls
>>>>>>>>
>>>>>>>> auto find(const string& needle, const string& haystack) ->
>>>>>>>> string_view [[gsl::lifetime(haystack)]];
>>>>>>>>
>>>>>>>> string_view sv = find(“needle”, haystack);
>>>>>>>> sv[0]; // OK
>>>>>>>> string_view sv = find(needle, “temporaryhaystack”);
>>>>>>>> sv[0]; // ERROR: sv is dangling
>>>>>>>>
>>>>>>>> This includes the following subparts.
>>>>>>>>
>>>>>>>> 4a. Precondition checks
>>>>>>>> Check that the psets of the arguments are valid at call site
>>>>>>>> according to the lifetime annotations of the callee.
>>>>>>>>
>>>>>>>> 4b. Postcondition checks
>>>>>>>> Check that the psets returned from a function adhere to its
>>>>>>>> advertised return/output psets.
>>>>>>>> Rigorous checking of not just the function arguments but also the
>>>>>>>> returned values is crucial part of the analysis.
>>>>>>>>
>>>>>>>> 4c. Lifetimes annotations
>>>>>>>>
>>>>>>>> The analysis gets pretty usable at this point. Most of the time the
>>>>>>>> user does not need any annotations, but it is crucial to have them before a
>>>>>>>> project can adapt it. For example, the user will occasionally want to
>>>>>>>> explicitly state that a member function is “const as far as Lifetime is
>>>>>>>> concerned” even though the function itself is not actually declared const
>>>>>>>> (e.g., vector::operator[] does not invalidate any Pointers, such as
>>>>>>>> iterators or raw pointers).
>>>>>>>>
>>>>>>>> 5. Implementing use after move analysis and exception support
>>>>>>>>
>>>>>>>> These parts are not implemented yet in our prototype, but they will
>>>>>>>> be useful additions for the analysis.
>>>>>>>>
>>>>>>>> *Questions*
>>>>>>>>
>>>>>>>> Does that make sense? What is the criteria for this work to be
>>>>>>>> upstreamed? Who is willing to participate in reviewing the patches?
>>>>>>>>
>>>>>>>> Thanks in advance,
>>>>>>>> Gabor, Matthias, and Herb
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://github.com/isocpp/CppCoreGuidelines/blob/master/docs/Lifetime.pdf
>>>>>>>> [2] https://github.com/mgehre/clang
>>>>>>>> [3] https://www.youtube.com/watch?v=80BZxujhY38
>>>>>>>> [4] https://godbolt.org/z/90puuu
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> cfe-dev mailing list
>>>>>>>> cfe-dev at lists.llvm.org
>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>>
>>>>>>>>
>>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190301/79588a0b/attachment.html>