[cfe-dev] [RFC] Adding lifetime analysis to clang

Gábor Horváth via cfe-dev cfe-dev at lists.llvm.org
Mon Dec 3 13:02:48 PST 2018


Maybe this needs a bit of clarification.

Clang already has some very useful statement local analysis for lifetime
issues that could benefit from the type categories we developed for the CFG
based analysis. The did not test how these warnings perform after being
enhanced with type categories because we did not implement the enhanced
versions yet (we were focusing on the CFG based analysis, which is also not
fully implemented). But extending those warnings should be trivial, so we
can report back how well the type categories perform in that context
without too much effort to help the decision making process.

As far as I understand, the biggest obstacle with InnerPointer checker at
the moment is how to model the state of complex C++ objects (e.g.:
string_views) in an implementation independent way. Using type categories
it should be simple to pinpoint types that have similar semantics to
string_views and strings, so it will be easier to generalize the check to
arbitrary user defined types. Unfortunately, it will not help with the
modelling part.

BTW thanks for the support, we will definitely add you as a reviewer for
the patches :)

On Mon, 3 Dec 2018 at 19:57, George Karpenkov <ekarpenkov at apple.com> wrote:

> Thanks! In our experience with the static analyzer it is always better to
> make these analyses benchmark-driven,
> so TBH I’m a bit puzzled that the implementation was fully developed
> without continuous evaluation.
>
> The overall approach (dataflow + heuristics + annotations to override
> heuristics) makes sense to me,
> but then the heuristics might need further tuning.
> As a somewhat orthogonal question, do you think the symbolic execution
> checker developed by Réka could also benefit from the heuristic/annotation
> combination you are proposing?
>
> On Dec 3, 2018, at 7:30 AM, Gábor Horváth <xazax.hun at gmail.com> wrote:
>
>
>
> On Fri, 30 Nov 2018 at 12:04, Manuel Klimek <klimek at google.com> wrote:
>
>>
>>
>> On Fri, Nov 30, 2018 at 12:00 PM Gábor Horváth <xazax.hun at gmail.com>
>> wrote:
>>
>>> Unfortunately, we do not have real world experience on a large codebase
>>> for the following reasons:
>>> * The implementation we have does not support annotations yet, and large
>>> projects are likely to contain exceptions to the default rules (even if
>>> only a few)
>>> * The analysis has an expected coding style, e.g. it does not reason
>>> about arithmetic on raw pointers
>>>
>>> We did run it on some projects like LLVM and there were a lot of noise
>>> mainly due to the unsupported language features like pointer arithmetic.
>>> The rest of the false positives were mostly due to the basic assumption
>>> of the analysis that every path is feasible.
>>> We did find some true positives with a similar analysis looking for use
>>> after moves in LLVM. Those true positives were redundant std::forward calls
>>> that are not going to cause any runtime error but still removing them would
>>> make the code cleaner and less likely to misbehave in the future. See:
>>> https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Support/Error.h#L894
>>> (The same arguments forwarded multiple times in a loop. Since the callee
>>> never actually moves the arguments we will not end up using a moved from
>>> object. But if we never move the argument, why would we use std::forward in
>>> the first place?)
>>>
>>> But even if the cfg-based lifetime analysis end up being a coding style
>>> specific check, the rest of the (cfg-less) clang warnings would become much
>>> more powerful after type categories are upstreamed. They could catch a lot
>>> of errors without having any false positives by default. Thus, I do see the
>>> value going forward, even if we do not have much real world experience yet.
>>>
>>
>> Could you run the restricted checks on llvm?
>>
>
> This makes perfect sense. We will report back with the results.
>
>
>>
>>
>>>
>>> Regards,
>>> Gabor
>>>
>>> On Thu, 29 Nov 2018 at 20:58, George Karpenkov <ekarpenkov at apple.com>
>>> wrote:
>>>
>>>> Thanks, the idea looks great!
>>>>
>>>> The heuristic described in a talk on matching returned and accepted
>>>> references looked fragile to me initially,
>>>> but after a bit of time I could not find obvious and common
>>>> counterexamples which would confuse it.
>>>>
>>>> Has the evaluation been done on any large project? E.g. LLVM itself,
>>>> Chrome, Webkit, or any other large C++ codebase?
>>>> What is the false positive rate? What is the annotation burden to
>>>> suppress those?
>>>> How many useful bugs were found?
>>>>
>>>> I’m also very happy that more dataflow analyses are being implemented
>>>> in Clang,
>>>> and I could help with reviews (though obviously I’m not the code owner
>>>> for most of the components).
>>>>
>>>> On Nov 29, 2018, at 8:02 AM, Gábor Horváth via cfe-dev <
>>>> cfe-dev at lists.llvm.org> wrote:
>>>>
>>>> Hi!
>>>>
>>>> This is a proposal to implement Lifetime Analysis [1] defined by Herb
>>>> Sutter in Clang.
>>>> Summary from the paper:
>>>> “This analysis shows how to efficiently diagnose many common cases of
>>>> dangling (use-after-free) in C++ code, using only local analysis to report
>>>> them as deterministic readable errors at compile time. The approach is to
>>>> identify variables that are of generalized “Owner” types (e.g., smart
>>>> pointers, containers, string) and “Pointer” types (e.g., int*, string_view,
>>>> span, iterators, and ranges), and then use a local simple acyclic control
>>>> flow graph (ACFG) analysis to track what each Pointer points to and
>>>> identify when modifying an Owner invalidates a Pointer. The analysis
>>>> leverages C++’s existing strong notions of scopes, object lifetimes, and
>>>> const that carry rich information already available in reasonably modern
>>>> C++ source code. Interestingly, it appears that with minor extension this
>>>> analysis can also detect uses of local moved-from variables
>>>> (use-after-move), which are a form of dangling.”
>>>> More details can be found in the paper [1] or in the CppCon keynote [3].
>>>>
>>>> Matthias Gehre and myself had been working on a prototype in Clang [2].
>>>> The changes are rather large, so we are planning to take an incremental
>>>> approach to upstreaming the features should the community want to see this
>>>> upstream.
>>>>
>>>> *Plans for upstreaming*
>>>>
>>>> 1. Upstream Type Categorization
>>>>
>>>> Clang already performs statement-local lifetime analyses that would
>>>> benefit from type categorization even before adding any other analysis.
>>>>
>>>> This includes annotating types as Owners and Pointers, and
>>>> automatically inferring Owner or Point without annotation to minimize
>>>> annotation burden.
>>>>
>>>> Consider the following code example:
>>>>
>>>> std::reference_wrapper<const int> get_data() {
>>>>    const int i = 3;
>>>>    return {i};
>>>> }
>>>>
>>>> Unfortunately, today compilers do not warn on this case of returning a
>>>> dangling reference. They do warn if we return a raw pointer or reference,
>>>> but the compiler does not know that std::reference_wrapper also is a
>>>> non-owning indirection. In the Lifetime analysis, this is diagnosed because
>>>> std::reference_wrapper is recognized as a Pointer type.
>>>>
>>>> As a first step we would upstream the type categorization part of the
>>>> analysis and make some clang warnings optionally use it. We would also
>>>> upstream a set of annotations to give the users a way to fix potential
>>>> false positives due to miscategorization. (This should be very rare
>>>> according to our experience so far). By default, we could constrain the
>>>> categorization for std types, whose semantics are known.
>>>>
>>>> 2. Extensions of existing CFG-less analyses
>>>>
>>>> 2a. Initialization from temporaries
>>>> The goal is to detect Pointers that dangle on initialization, such as
>>>> std::string_view sv = “test”s;
>>>> By restricting the analysis to single statements, it has a low
>>>> false-positive rate and can be done without building a CFG (i.e. faster).
>>>>
>>>> 2b. Return of locals
>>>> The goal is to detect returning Pointers to local variables, e.g.
>>>> std::reference_wrapper<const int> get_data() {
>>>>    const int i = 3;
>>>>    return {i};
>>>> }
>>>> Similar to 2a also restricted to single statement.
>>>>
>>>> 2c. Member pointer that dangles once construction is complete
>>>> struct X {
>>>>    std::string_view sv;
>>>>    X() : sv("test"s) {} // warning: string_view member bound to string
>>>> temporary whose lifetime ends within the constructor
>>>> };
>>>>
>>>> 2d. New of a Pointer that dangles after the end of the full-expression
>>>> new string_view("test"s) // warning: dynamically-allocated string_view
>>>> refers to string whose lifetime ends at the end of the full-expression
>>>>
>>>> 3. Intra-function analysis across basic blocks, excluding function call
>>>> expressions
>>>> Propagate point-to sets of Pointers across branches/loops
>>>> intra-function, e.g. analysing
>>>>
>>>> int* p = &i;
>>>> if(condition)
>>>>  p = nullptr;
>>>> *p; // ERROR: p is possibly null
>>>>
>>>> We have some CFG patches and some code traversing the CFG and
>>>> propagating the analysis state. With the type categories already in place,
>>>> this patch should be smaller. We could split these patches further by
>>>> implementing null tracking in a separate patch.
>>>>
>>>> 4. Function calls
>>>>
>>>> auto find(const string& needle, const string& haystack) -> string_view
>>>> [[gsl::lifetime(haystack)]];
>>>>
>>>> string_view sv = find(“needle”, haystack);
>>>> sv[0]; // OK
>>>> string_view sv = find(needle, “temporaryhaystack”);
>>>> sv[0]; // ERROR: sv is dangling
>>>>
>>>> This includes the following subparts.
>>>>
>>>> 4a. Precondition checks
>>>> Check that the psets of the arguments are valid at call site according
>>>> to the lifetime annotations of the callee.
>>>>
>>>> 4b. Postcondition checks
>>>> Check that the psets returned from a function adhere to its advertised
>>>> return/output psets.
>>>> Rigorous checking of not just the function arguments but also the
>>>> returned values is crucial part of the analysis.
>>>>
>>>> 4c. Lifetimes annotations
>>>>
>>>> The analysis gets pretty usable at this point. Most of the time the
>>>> user does not need any annotations, but it is crucial to have them before a
>>>> project can adapt it. For example, the user will occasionally want to
>>>> explicitly state that a member function is “const as far as Lifetime is
>>>> concerned” even though the function itself is not actually declared const
>>>> (e.g., vector::operator[] does not invalidate any Pointers, such as
>>>> iterators or raw pointers).
>>>>
>>>> 5. Implementing use after move analysis and exception support
>>>>
>>>> These parts are not implemented yet in our prototype, but they will be
>>>> useful additions for the analysis.
>>>>
>>>> *Questions*
>>>>
>>>> Does that make sense? What is the criteria for this work to be
>>>> upstreamed? Who is willing to participate in reviewing the patches?
>>>>
>>>> Thanks in advance,
>>>> Gabor, Matthias, and Herb
>>>>
>>>> [1]
>>>> https://github.com/isocpp/CppCoreGuidelines/blob/master/docs/Lifetime.pdf
>>>> [2] https://github.com/mgehre/clang
>>>> [3] https://www.youtube.com/watch?v=80BZxujhY38
>>>> [4] https://godbolt.org/z/90puuu
>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>
>>>>
>>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20181203/09fc02fa/attachment.html>


More information about the cfe-dev mailing list