[cfe-dev] [RFC] Adding lifetime analysis to clang
Manuel Klimek via cfe-dev
cfe-dev at lists.llvm.org
Fri Mar 1 03:06:00 PST 2019
On Fri, Mar 1, 2019 at 11:51 AM Gábor Horváth <xazax.hun at gmail.com> wrote:
> Hi Manuel,
>
> On Fri, 30 Nov 2018 at 12:04, Manuel Klimek <klimek at google.com> wrote:
>
>>
>>
>> On Fri, Nov 30, 2018 at 12:00 PM Gábor Horváth <xazax.hun at gmail.com>
>> wrote:
>>
>>> Unfortunately, we do not have real world experience on a large codebase
>>> for the following reasons:
>>> * The implementation we have does not support annotations yet, and large
>>> projects are likely to contain exceptions to the default rules (even if
>>> only a few)
>>> * The analysis has an expected coding style, e.g. it does not reason
>>> about arithmetic on raw pointers
>>>
>>> We did run it on some projects like LLVM and there were a lot of noise
>>> mainly due to the unsupported language features like pointer arithmetic.
>>> The rest of the false positives were mostly due to the basic assumption
>>> of the analysis that every path is feasible.
>>> We did find some true positives with a similar analysis looking for use
>>> after moves in LLVM. Those true positives were redundant std::forward calls
>>> that are not going to cause any runtime error but still removing them would
>>> make the code cleaner and less likely to misbehave in the future. See:
>>> https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Support/Error.h#L894
>>> (The same arguments forwarded multiple times in a loop. Since the callee
>>> never actually moves the arguments we will not end up using a moved from
>>> object. But if we never move the argument, why would we use std::forward in
>>> the first place?)
>>>
>>> But even if the cfg-based lifetime analysis end up being a coding style
>>> specific check, the rest of the (cfg-less) clang warnings would become much
>>> more powerful after type categories are upstreamed. They could catch a lot
>>> of errors without having any false positives by default. Thus, I do see the
>>> value going forward, even if we do not have much real world experience yet.
>>>
>>
>> Could you run the restricted checks on llvm?
>>
>
> Sorry for the delay, but finally I found some time to actually implement
> the CFG-less checks and run them on LLVM. The quick prototype is available
> here:
> https://github.com/mgehre/clang/commits/lifetime-warning-typecategories
> Literally, all of the false positives I encountered after running on LLVM
> and Clang were due to llvm::ValueHandleBase being miscategorized as an
> owner (because of its user-defined destructor).
> So annotating this class would render the LLVM codebase clean of these new
> lifetime errors. Do you think this is a strong enough evidence to start
> upstreaming the type categories or do you think we should do additional
> measurements?
>
I'd like to get Richard's opinion on this.
> I also plan to look up some lifetime fixes/reverts in the commit history
> to check if some of those problems would have been caught by these
> warnings. I will share the results in a EuroLLVM technical talk.
>
> P.S.:
> Does anybody have a good idea how to get a list of buildbot breaking
> commits where the reason was lifetime related (other than grepping git
> history)?
>
> Regards,
> Gábor
>
>
>>
>>
>>>
>>> Regards,
>>> Gabor
>>>
>>> On Thu, 29 Nov 2018 at 20:58, George Karpenkov <ekarpenkov at apple.com>
>>> wrote:
>>>
>>>> Thanks, the idea looks great!
>>>>
>>>> The heuristic described in a talk on matching returned and accepted
>>>> references looked fragile to me initially,
>>>> but after a bit of time I could not find obvious and common
>>>> counterexamples which would confuse it.
>>>>
>>>> Has the evaluation been done on any large project? E.g. LLVM itself,
>>>> Chrome, Webkit, or any other large C++ codebase?
>>>> What is the false positive rate? What is the annotation burden to
>>>> suppress those?
>>>> How many useful bugs were found?
>>>>
>>>> I’m also very happy that more dataflow analyses are being implemented
>>>> in Clang,
>>>> and I could help with reviews (though obviously I’m not the code owner
>>>> for most of the components).
>>>>
>>>> On Nov 29, 2018, at 8:02 AM, Gábor Horváth via cfe-dev <
>>>> cfe-dev at lists.llvm.org> wrote:
>>>>
>>>> Hi!
>>>>
>>>> This is a proposal to implement Lifetime Analysis [1] defined by Herb
>>>> Sutter in Clang.
>>>> Summary from the paper:
>>>> “This analysis shows how to efficiently diagnose many common cases of
>>>> dangling (use-after-free) in C++ code, using only local analysis to report
>>>> them as deterministic readable errors at compile time. The approach is to
>>>> identify variables that are of generalized “Owner” types (e.g., smart
>>>> pointers, containers, string) and “Pointer” types (e.g., int*, string_view,
>>>> span, iterators, and ranges), and then use a local simple acyclic control
>>>> flow graph (ACFG) analysis to track what each Pointer points to and
>>>> identify when modifying an Owner invalidates a Pointer. The analysis
>>>> leverages C++’s existing strong notions of scopes, object lifetimes, and
>>>> const that carry rich information already available in reasonably modern
>>>> C++ source code. Interestingly, it appears that with minor extension this
>>>> analysis can also detect uses of local moved-from variables
>>>> (use-after-move), which are a form of dangling.”
>>>> More details can be found in the paper [1] or in the CppCon keynote [3].
>>>>
>>>> Matthias Gehre and myself had been working on a prototype in Clang [2].
>>>> The changes are rather large, so we are planning to take an incremental
>>>> approach to upstreaming the features should the community want to see this
>>>> upstream.
>>>>
>>>> *Plans for upstreaming*
>>>>
>>>> 1. Upstream Type Categorization
>>>>
>>>> Clang already performs statement-local lifetime analyses that would
>>>> benefit from type categorization even before adding any other analysis.
>>>>
>>>> This includes annotating types as Owners and Pointers, and
>>>> automatically inferring Owner or Point without annotation to minimize
>>>> annotation burden.
>>>>
>>>> Consider the following code example:
>>>>
>>>> std::reference_wrapper<const int> get_data() {
>>>> const int i = 3;
>>>> return {i};
>>>> }
>>>>
>>>> Unfortunately, today compilers do not warn on this case of returning a
>>>> dangling reference. They do warn if we return a raw pointer or reference,
>>>> but the compiler does not know that std::reference_wrapper also is a
>>>> non-owning indirection. In the Lifetime analysis, this is diagnosed because
>>>> std::reference_wrapper is recognized as a Pointer type.
>>>>
>>>> As a first step we would upstream the type categorization part of the
>>>> analysis and make some clang warnings optionally use it. We would also
>>>> upstream a set of annotations to give the users a way to fix potential
>>>> false positives due to miscategorization. (This should be very rare
>>>> according to our experience so far). By default, we could constrain the
>>>> categorization for std types, whose semantics are known.
>>>>
>>>> 2. Extensions of existing CFG-less analyses
>>>>
>>>> 2a. Initialization from temporaries
>>>> The goal is to detect Pointers that dangle on initialization, such as
>>>> std::string_view sv = “test”s;
>>>> By restricting the analysis to single statements, it has a low
>>>> false-positive rate and can be done without building a CFG (i.e. faster).
>>>>
>>>> 2b. Return of locals
>>>> The goal is to detect returning Pointers to local variables, e.g.
>>>> std::reference_wrapper<const int> get_data() {
>>>> const int i = 3;
>>>> return {i};
>>>> }
>>>> Similar to 2a also restricted to single statement.
>>>>
>>>> 2c. Member pointer that dangles once construction is complete
>>>> struct X {
>>>> std::string_view sv;
>>>> X() : sv("test"s) {} // warning: string_view member bound to string
>>>> temporary whose lifetime ends within the constructor
>>>> };
>>>>
>>>> 2d. New of a Pointer that dangles after the end of the full-expression
>>>> new string_view("test"s) // warning: dynamically-allocated string_view
>>>> refers to string whose lifetime ends at the end of the full-expression
>>>>
>>>> 3. Intra-function analysis across basic blocks, excluding function call
>>>> expressions
>>>> Propagate point-to sets of Pointers across branches/loops
>>>> intra-function, e.g. analysing
>>>>
>>>> int* p = &i;
>>>> if(condition)
>>>> p = nullptr;
>>>> *p; // ERROR: p is possibly null
>>>>
>>>> We have some CFG patches and some code traversing the CFG and
>>>> propagating the analysis state. With the type categories already in place,
>>>> this patch should be smaller. We could split these patches further by
>>>> implementing null tracking in a separate patch.
>>>>
>>>> 4. Function calls
>>>>
>>>> auto find(const string& needle, const string& haystack) -> string_view
>>>> [[gsl::lifetime(haystack)]];
>>>>
>>>> string_view sv = find(“needle”, haystack);
>>>> sv[0]; // OK
>>>> string_view sv = find(needle, “temporaryhaystack”);
>>>> sv[0]; // ERROR: sv is dangling
>>>>
>>>> This includes the following subparts.
>>>>
>>>> 4a. Precondition checks
>>>> Check that the psets of the arguments are valid at call site according
>>>> to the lifetime annotations of the callee.
>>>>
>>>> 4b. Postcondition checks
>>>> Check that the psets returned from a function adhere to its advertised
>>>> return/output psets.
>>>> Rigorous checking of not just the function arguments but also the
>>>> returned values is crucial part of the analysis.
>>>>
>>>> 4c. Lifetimes annotations
>>>>
>>>> The analysis gets pretty usable at this point. Most of the time the
>>>> user does not need any annotations, but it is crucial to have them before a
>>>> project can adapt it. For example, the user will occasionally want to
>>>> explicitly state that a member function is “const as far as Lifetime is
>>>> concerned” even though the function itself is not actually declared const
>>>> (e.g., vector::operator[] does not invalidate any Pointers, such as
>>>> iterators or raw pointers).
>>>>
>>>> 5. Implementing use after move analysis and exception support
>>>>
>>>> These parts are not implemented yet in our prototype, but they will be
>>>> useful additions for the analysis.
>>>>
>>>> *Questions*
>>>>
>>>> Does that make sense? What is the criteria for this work to be
>>>> upstreamed? Who is willing to participate in reviewing the patches?
>>>>
>>>> Thanks in advance,
>>>> Gabor, Matthias, and Herb
>>>>
>>>> [1]
>>>> https://github.com/isocpp/CppCoreGuidelines/blob/master/docs/Lifetime.pdf
>>>> [2] https://github.com/mgehre/clang
>>>> [3] https://www.youtube.com/watch?v=80BZxujhY38
>>>> [4] https://godbolt.org/z/90puuu
>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>
>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190301/ac7e904f/attachment.html>
More information about the cfe-dev
mailing list