[cfe-dev] [RFC] Adding lifetime analysis to clang
Gábor Horváth via cfe-dev
cfe-dev at lists.llvm.org
Thu Nov 29 08:02:47 PST 2018
Hi!
This is a proposal to implement Lifetime Analysis [1] defined by Herb
Sutter in Clang.
Summary from the paper:
“This analysis shows how to efficiently diagnose many common cases of
dangling (use-after-free) in C++ code, using only local analysis to report
them as deterministic readable errors at compile time. The approach is to
identify variables that are of generalized “Owner” types (e.g., smart
pointers, containers, string) and “Pointer” types (e.g., int*, string_view,
span, iterators, and ranges), and then use a local simple acyclic control
flow graph (ACFG) analysis to track what each Pointer points to and
identify when modifying an Owner invalidates a Pointer. The analysis
leverages C++’s existing strong notions of scopes, object lifetimes, and
const that carry rich information already available in reasonably modern
C++ source code. Interestingly, it appears that with minor extension this
analysis can also detect uses of local moved-from variables
(use-after-move), which are a form of dangling.”
More details can be found in the paper [1] or in the CppCon keynote [3].
Matthias Gehre and myself had been working on a prototype in Clang [2]. The
changes are rather large, so we are planning to take an incremental
approach to upstreaming the features should the community want to see this
upstream.
*Plans for upstreaming*
1. Upstream Type Categorization
Clang already performs statement-local lifetime analyses that would benefit
from type categorization even before adding any other analysis.
This includes annotating types as Owners and Pointers, and automatically
inferring Owner or Point without annotation to minimize annotation burden.
Consider the following code example:
std::reference_wrapper<const int> get_data() {
const int i = 3;
return {i};
}
Unfortunately, today compilers do not warn on this case of returning a
dangling reference. They do warn if we return a raw pointer or reference,
but the compiler does not know that std::reference_wrapper also is a
non-owning indirection. In the Lifetime analysis, this is diagnosed because
std::reference_wrapper is recognized as a Pointer type.
As a first step we would upstream the type categorization part of the
analysis and make some clang warnings optionally use it. We would also
upstream a set of annotations to give the users a way to fix potential
false positives due to miscategorization. (This should be very rare
according to our experience so far). By default, we could constrain the
categorization for std types, whose semantics are known.
2. Extensions of existing CFG-less analyses
2a. Initialization from temporaries
The goal is to detect Pointers that dangle on initialization, such as
std::string_view sv = “test”s;
By restricting the analysis to single statements, it has a low
false-positive rate and can be done without building a CFG (i.e. faster).
2b. Return of locals
The goal is to detect returning Pointers to local variables, e.g.
std::reference_wrapper<const int> get_data() {
const int i = 3;
return {i};
}
Similar to 2a also restricted to single statement.
2c. Member pointer that dangles once construction is complete
struct X {
std::string_view sv;
X() : sv("test"s) {} // warning: string_view member bound to string
temporary whose lifetime ends within the constructor
};
2d. New of a Pointer that dangles after the end of the full-expression
new string_view("test"s) // warning: dynamically-allocated string_view
refers to string whose lifetime ends at the end of the full-expression
3. Intra-function analysis across basic blocks, excluding function call
expressions
Propagate point-to sets of Pointers across branches/loops intra-function,
e.g. analysing
int* p = &i;
if(condition)
p = nullptr;
*p; // ERROR: p is possibly null
We have some CFG patches and some code traversing the CFG and propagating
the analysis state. With the type categories already in place, this patch
should be smaller. We could split these patches further by implementing
null tracking in a separate patch.
4. Function calls
auto find(const string& needle, const string& haystack) -> string_view
[[gsl::lifetime(haystack)]];
string_view sv = find(“needle”, haystack);
sv[0]; // OK
string_view sv = find(needle, “temporaryhaystack”);
sv[0]; // ERROR: sv is dangling
This includes the following subparts.
4a. Precondition checks
Check that the psets of the arguments are valid at call site according to
the lifetime annotations of the callee.
4b. Postcondition checks
Check that the psets returned from a function adhere to its advertised
return/output psets.
Rigorous checking of not just the function arguments but also the returned
values is crucial part of the analysis.
4c. Lifetimes annotations
The analysis gets pretty usable at this point. Most of the time the user
does not need any annotations, but it is crucial to have them before a
project can adapt it. For example, the user will occasionally want to
explicitly state that a member function is “const as far as Lifetime is
concerned” even though the function itself is not actually declared const
(e.g., vector::operator[] does not invalidate any Pointers, such as
iterators or raw pointers).
5. Implementing use after move analysis and exception support
These parts are not implemented yet in our prototype, but they will be
useful additions for the analysis.
*Questions*
Does that make sense? What is the criteria for this work to be upstreamed?
Who is willing to participate in reviewing the patches?
Thanks in advance,
Gabor, Matthias, and Herb
[1]
https://github.com/isocpp/CppCoreGuidelines/blob/master/docs/Lifetime.pdf
[2] https://github.com/mgehre/clang
[3] https://www.youtube.com/watch?v=80BZxujhY38
[4] https://godbolt.org/z/90puuu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20181129/95847f92/attachment.html>
More information about the cfe-dev
mailing list