[cfe-dev] [RFC] A dataflow analysis framework for Clang AST
Yitzhak Mandelbaum via cfe-dev
cfe-dev at lists.llvm.org
Tue Oct 12 18:40:27 PDT 2021
(for real this time)
On Tue, Oct 12, 2021 at 9:39 PM Yitzhak Mandelbaum <yitzhakm at google.com>
wrote:
> -vsavchenko at apple.com (which keeps bouncing)
>
> On Tue, Oct 12, 2021 at 8:52 PM Yitzhak Mandelbaum <yitzhakm at google.com>
> wrote:
>
>> Gabor, Artem,
>>
>> Thanks for your support, and, Artem, thank you for quoting as plain text
>> -- you're right, the system shunted it to the moderator because of its size.
>>
>> To answer some of the questions/comments raised:
>> > There is a nice summary of some of the ad-hoc solutions we have in
>> Clang in this survey: [cfe-dev] A survey of dataflow analyses in Clang
>> (llvm.org)
>> <https://lists.llvm.org/pipermail/cfe-dev/2020-October/066937.html>
>> > Do you plan to replace the ad-hoc solutions in the long run?
>>
>> We mention this briefly in the RFC as motivation, but to reiterate here:
>> we'd like the framework to at least be *suitable* to replace these ad-hoc
>> solutions. Whether we would get to this ourselves or encourage others,
>> review patches, etc. I can't say at this point. I suspect we'll refactor
>> at least a few ourselves just to test out the robustness of the framework.
>>
>> Just to reiterate, you want to abstract away both from the dataflow
>>> algorithm and the semantics of C++. E.g., the user might not need to handle
>>> the myriad of ways someone can assign a value to a variable (overloaded
>>> operator=, output argument, primitive assignment, etc), only a single
>>> abstract concept of assignment.
>>
>>
>> Exactly! Our first version already does this to some extent, we just
>> feel it's somewhat tailored to our current needs and could use some
>> generalization. For example, it would be great to know when an lvalue is
>> read for the last time and written to another lvalue, because that would
>> suggest a potential `std::move`, among other things. Yet, that balance of
>> detail and abstraction is not yet available. We only model the *effects* of
>> reads and writes without explicitly abstracting them (say, tagging each CFG
>> node w/ the set of reads and writes). So, we think there's room to explore
>> the design space here, but also think what we have will be valuable to a
>> number of analyses right away.
>>
>> While most of the checks in CSA are using symbolic execution, there are
>>> also some syntactic rules. One dilemma of implementing a new check is where
>>> to put it (CSA? Tidy? Warnings?). While it would be great to get rid of
>>> this dilemma, I think we want to keep all of the options open. I just
>>> wanted to make it explicit what kinds of layering constraints we have. It
>>> should be possible to add CSA checks/features using the new dataflow
>>> framework.
>>
>>
>> Great point! We hadn't thought about this possibility, but there's no
>> good reason to rule out use in CSA. Our key layering constraint is that it
>> be available in clang-tidy. So, if we put it somewhere central, like
>> clang/include/clang/Analysis, would that be suitable for use in CSA? We're
>> certainly open to other suggestions as well.
>>
>> Valeriy's -Wcompletion-handler being
>>> one of the notable latest additions - a backwards analysis over finite
>>> state lattice with a sophisticated system of notes attached to the
>>> warning to explain what's happening (generalizing over the latter may
>>> possibly be a goal for your framework as well).
>>
>>
>> Yes, we're definitely interested in being able to produce warnings that
>> give the user a clear idea of what is wrong and (if possible) how to fix
>> it. This is complicated once we involve the SAT solver, since it tends to
>> give us simple yes/no answers. But, actionable warnings seem crucial for
>> successful user adoption.
>>
>> I'm also very curious about your SAT solver implementation; I have a
>>> recent hobby of plugging various solvers into the static analyzer (cf.
>>> https://reviews.llvm.org/D110125) because it obviously needs some -
>>> arguably as much as your endeavor.
>>
>>
>> Yes, we'd love to tell you more, but I'll have to defer to sgatev@,
>> since he's the author and most familiar with the details.
>>
>> Cheers,
>> Yitzhak
>>
>>
>> On Tue, Oct 12, 2021 at 5:34 PM Artem Dergachev <noqnoqneo at gmail.com>
>> wrote:
>>
>>> Sounds like the mail didn't hit the mailing list due to size
>>> restriction. I'm quoting in plain text now to make sure it definitely
>>> hits the list in some form.
>>>
>>> Like Gabor already said, It's amazing that this is happening. I
>>> completely agree with your goals and I think we've discussed a lot of
>>> this offline earlier. The amount of existing flow-sensitive warnings may
>>> be low but it's steadily growing, Valeriy's -Wcompletion-handler being
>>> one of the notable latest additions - a backwards analysis over finite
>>> state lattice with a sophisticated system of notes attached to the
>>> warning to explain what's happening (generalizing over the latter may
>>> possibly be a goal for your framework as well).
>>>
>>> Aside from the ability to cover *the other 50% of warnings* compared to
>>> what the static analyzer already covers (may-problems vs. must-problems
>>> aka all-paths problems), another strong benefit of flow sensitive
>>> warnings is their performance which allows us to use them as compiler
>>> warnings (as opposed to separate tools like clang-tidy or static
>>> analyzer). This in turn delivers the warning to every user of the
>>> compiler which increases the user coverage dramatically. This
>>> consideration may even apply to may-problems in which the static
>>> analyzer already excels technically.
>>>
>>> I'm also very curious about your SAT solver implementation; I have a
>>> recent hobby of plugging various solvers into the static analyzer (cf.
>>> https://reviews.llvm.org/D110125) because it obviously needs some -
>>> arguably as much as your endeavor.
>>>
>>>
>>> On 10/12/21 8:32 AM, Yitzhak Mandelbaum wrote:
>>> > Summary
>>> >
>>> > We propose a new static analysis framework for implementing Clang
>>> > bug-finding and refactoring tools (including ClangTidy checks) based
>>> > on a dataflow algorithm.
>>> >
>>> >
>>> > Some ClangTidy checks and core compiler warnings are already
>>> > dataflowbased. However, due to a lack of a reusable framework, all
>>> > reimplement common dataflow-analysis and CFG-traversal algorithms. In
>>> > addition to supporting new checks, the framework should allow us to
>>> > clean up those parts of Clang and ClangTidy.
>>> >
>>> >
>>> > We intend that the new framework should have a learning curve similar
>>> > to AST matchers and not require prior expertise in static analysis.
>>> > Our goal is a simple API that isolates the user from the dataflow
>>> > algorithm itself. The initial version of the framework combines
>>> > support for classic dataflow analysesover the Clang CFG with symbolic
>>> > evaluation of the code and a simple model of local variables and the
>>> > heap. A number of useful checks can be expressed solely in terms of
>>> > this model, saving their implementers from having to reimplement
>>> > modeling of the semantics of C++ AST nodes. Ultimately, we want to
>>> > extend this convenience to a wide variety of checks, so that they can
>>> > be written in terms of the abstract semantics of the program (as
>>> > modeled by the framework), while only matching and modeling AST nodes
>>> > for key parts of their domain (e.g., custom data types or functions).
>>> >
>>> >
>>> > Motivation
>>> >
>>> > Clang has a few subsystems that greatly help with implementing
>>> > bug-finding and refactoring tools:
>>> >
>>> > *
>>> >
>>> > AST matchersallow one to easily find program fragments whose AST
>>> > matches a given pattern. This works well for finding program
>>> > fragments whose desired semantics translates to a small number of
>>> > possible syntactic expressions (e.g., "find a call to
>>> > strlen()where the argument is a null pointer constant"). Note that
>>> > patterns expressed by AST matchers often check program properties
>>> > that hold for allexecution paths.
>>> >
>>> > *
>>> >
>>> > Clang Static Analyzercan find program paths that end in a state
>>> > that satisfies a specific user-defined property, regardless of how
>>> > the code was expressed syntactically (e.g., "find a call to
>>> > strlen()where the argument is dynamically equal to null"). CSA
>>> > abstracts from the syntax of the code, letting analyses focus on
>>> > its semantics. While not completely eliding syntax, symbolic
>>> > evaluation abstracts over many syntactic details. Semantically
>>> > equivalent program fragments result in equivalent program states
>>> > computed by the CSA.
>>> >
>>> >
>>> > When implementing certain bug-finding and refactoring tools, it is
>>> > often necessary to find program fragments where a given property holds
>>> > on all execution paths, while placing as few demands as possible on
>>> > how the user writes their code. The proposed dataflow analysis
>>> > framework aims to satisfy both of these needs simultaneously.
>>> >
>>> >
>>> > Example: Ensuring that std::optionalis used safely
>>> >
>>> > Consider implementing a static analysis tool that proves that every
>>> > time a std::optionalis unwrapped, the program has ensured that the
>>> > precondition of the unwrap operation (the optional has a value) is
>>> > satisfied.
>>> >
>>> >
>>> > voidTestOK(std::optional<int>x){
>>> >
>>> > if(x.has_value()){
>>> >
>>> > use(x.value());
>>> >
>>> > }
>>> >
>>> > }
>>> >
>>> >
>>> > voidTestWithWarning(std::optional<int>x){
>>> >
>>> > use(x.value());// warning: unchecked access to optional value
>>> >
>>> > }
>>> >
>>> >
>>> > If we implemented this check in CSA, it would likely find the bug in
>>> > TestWithWarning, and would not report anything in TestOK. However, CSA
>>> > did not prove the desired property (all optional unwraps are checked
>>> > on all paths) for TestOK; CSA just could not find a path through
>>> > TestOKwhere this property is violated.
>>> >
>>> >
>>> > A simple function like TestOKonly has two paths and would be
>>> > completely explored by the CSA. However, when loops are involved, CSA
>>> > uses heuristics to decide when to stop exploring paths in a function.
>>> > As a result, CSA would miss this bug:
>>> >
>>> >
>>> > voidMissedBugByCSA(std::optional<int>x){
>>> >
>>> > for(inti =0;i <10;i++){
>>> >
>>> > if(i <5){
>>> >
>>> > use(i);
>>> >
>>> > }else{
>>> >
>>> > use(x.value());// CSA does not report a warning
>>> >
>>> > }
>>> >
>>> > }
>>> >
>>> > }
>>> >
>>> >
>>> > If we implemented this check with AST matchers (as typically done in
>>> > ClangTidy), we would have to enumerate way too many syntactic
>>> > patterns. Consider just a couple coding variations that users expect
>>> > this analysis to understand to be safe:
>>> >
>>> >
>>> > voidTestOK(std::optional<int>x){
>>> >
>>> > if(x){
>>> >
>>> > use(x.value());
>>> >
>>> > }
>>> >
>>> > }
>>> >
>>> >
>>> > voidTestOK(std::optional<int>x){
>>> >
>>> > if(x.has_value()==true){
>>> >
>>> > use(x.value());
>>> >
>>> > }
>>> >
>>> > }
>>> >
>>> >
>>> > voidTestOK(std::optional<int>x){
>>> >
>>> > if(!x.has_value()){
>>> >
>>> > use(0);
>>> >
>>> > }else{
>>> >
>>> > use(x.value());
>>> >
>>> > }
>>> >
>>> > }
>>> >
>>> >
>>> > voidTestOK(std::optional<int>x){
>>> >
>>> > if(!x.has_value())
>>> >
>>> > return;
>>> >
>>> > use(x.value());
>>> >
>>> > }
>>> >
>>> >
>>> > To summarize, CSA can find someoptional-unwrapping bugs, but it does
>>> > not prove that the code performs a check on allpaths. AST matchers can
>>> > prove it for a finite set of hard-coded syntactic cases. But, when the
>>> > set of safe patterns is infinite, as is often the case, AST matchers
>>> > can only capture a subset. Capturing the full set requires building a
>>> > framework like the one presented in this document.
>>> >
>>> >
>>> > Comparison
>>> >
>>> >
>>> >
>>> >
>>> > AST Matchers Library
>>> >
>>> >
>>> >
>>> > Clang Static Analyzer
>>> >
>>> >
>>> >
>>> > Dataflow Analysis Framework
>>> >
>>> > Proves that a property holds on all execution paths
>>> >
>>> >
>>> >
>>> > Possible, depends on the matcher
>>> >
>>> >
>>> >
>>> > No
>>> >
>>> >
>>> >
>>> > Yes
>>> >
>>> > Checks can be written in terms of program semantics
>>> >
>>> >
>>> >
>>> > No
>>> >
>>> >
>>> >
>>> > Yes
>>> >
>>> >
>>> >
>>> > Yes
>>> >
>>> >
>>> > Further Examples
>>> >
>>> > While we have started with a std::optionalchecker, we have a number of
>>> > further use cases where this framework can help. Note that some of
>>> > these are already implemented in complex, specialized checks, either
>>> > directly in the compiler or as clang-tidy checks. A key goal of our
>>> > framework is to simplify the implementation of such checks in the
>>> future.
>>> >
>>> > *
>>> >
>>> > “the pointer may be null when dereferenced”, “the pointer is
>>> > always null when dereferenced”,
>>> >
>>> > *
>>> >
>>> > "unnecessary null check on a pointer passed to delete/free",
>>> >
>>> > *
>>> >
>>> > “this raw pointer variable could be refactored into a
>>> > std::unique_ptr”,
>>> >
>>> > *
>>> >
>>> > “expression always evaluates to a constant”, “if condition always
>>> > evaluates to true/false”,
>>> >
>>> > *
>>> >
>>> > “basic block is never executed”, “loop is executed at most once”,
>>> >
>>> > *
>>> >
>>> > “assigned value is always overwritten”,
>>> >
>>> > *
>>> >
>>> > “value was used after having been moved”,
>>> >
>>> > *
>>> >
>>> > “value guarded by a lock was used without taking the lock”,
>>> >
>>> > *
>>> >
>>> > “unnecessary existence check before inserting a value into a
>>> > std::map”.
>>> >
>>> >
>>> > Some of these use cases are covered by existing ClangTidy checks that
>>> > are implemented with pattern matching. We believe there is an
>>> > opportunity to simplify their implementation by refactoring them to
>>> > use the dataflow framework, and simultaneously improve the coverage
>>> > and precision of how they model C++ semantics.
>>> >
>>> >
>>> > Overview
>>> >
>>> > The initial version of our framework provides the following key
>>> features:
>>> >
>>> > *
>>> >
>>> > a forward dataflow algorithm, parameterized by a user-defined
>>> > value domain and a transfer function, which describes how program
>>> > statements modify domain values.
>>> >
>>> > *
>>> >
>>> > a built-in model of local variables and the heap. This model
>>> > provides two benefits to users: it tracks the flow of values,
>>> > accounting for a wide variety of C++ language constructs, in a way
>>> > that is reusable across many different analyses; it discriminates
>>> > value approximations based on path conditions, allowing for higher
>>> > precision than dataflow alone.
>>> >
>>> > *
>>> >
>>> > an interface to an SMT solver to verify safety of operations like
>>> > dereferencing a pointer,
>>> >
>>> > *
>>> >
>>> > test infrastructure for validating results of the dataflow
>>> algorithm.
>>> >
>>> >
>>> > Dataflow Algorithm
>>> >
>>> > Our framework solves forward dataflow equations over a product of a
>>> > user-defined value domain and a built-in model of local variables and
>>> > the heap, called the environment. The user implements the transfer
>>> > function for their value domain, while the framework implements the
>>> > transfer function that operates on the environment, only requiring a
>>> > user-defined widening operator to avoid infinite growth of the
>>> > environment model. We intend to expand the framework to also support
>>> > backwards dataflow equations at some point in the future. The
>>> > limitation to the forward direction is not essential.
>>> >
>>> >
>>> > In more detail, the algorithm is parameterized by:
>>> >
>>> >
>>> > *
>>> >
>>> > a user-defined, finite-height partial order, with a join function
>>> > and maximal element,
>>> >
>>> > *
>>> >
>>> > an initial element from the user-defined lattice,
>>> >
>>> > *
>>> >
>>> > a (monotonic) transfer function for statements, which maps between
>>> > lattice elements,
>>> >
>>> > *
>>> >
>>> > comparison and merge functions for environment values.
>>> >
>>> >
>>> > We expand on the first three elements here and return to the
>>> > comparison and merge functions in the next section.
>>> >
>>> >
>>> > The Lattice and Analysis Abstractions
>>> >
>>> > We start with a representation of a lattice:
>>> >
>>> > classBoundedJoinSemiLattice{
>>> >
>>> > public:
>>> >
>>> > friendbooloperator==(constBoundedJoinSemiLattice&lhs,
>>> >
>>> > constBoundedJoinSemiLattice&rhs);
>>> >
>>> > friendbooloperator<=(constBoundedJoinSemiLattice&lhs,
>>> >
>>> > constBoundedJoinSemiLattice&rhs);
>>> >
>>> > staticBoundedJoinSemiLatticeTop();
>>> >
>>> > enumclassJoinEffect{
>>> >
>>> > Changed,
>>> >
>>> > Unchanged,
>>> >
>>> > };
>>> >
>>> > JoinEffectJoin(constBoundedJoinSemiLattice&element);
>>> >
>>> > };
>>> >
>>> > Not all of these declarations are necessary. Specifically,
>>> > operator==and operator<=can be implemented in terms of Joinand
>>> > Topisn’t used by the dataflow algorithm. We include the operators
>>> > because direct implementations are likely more efficient, and
>>> > Topbecause it provides evidence of a well-founded bounded join
>>> > semi-lattice.
>>> >
>>> >
>>> > The Joinoperation is unusual in its choice of mutating the object
>>> > instance and returning an indication of whether the join resulted in
>>> > any changes. A key step of the fixpoint algorithm is code like j =
>>> > Join(x, y); if (j == x) …. This formulation of Joinallows us to
>>> > express this code instead as if (x.Join(y) == Unchanged).... For
>>> > lattices whose instances are expensive to create or compare, we expect
>>> > this formulation to improve the performance of the dataflow analysis.
>>> >
>>> >
>>> > Based on the lattice abstraction, our analysis is similarly
>>> traditional:
>>> >
>>> > classDataflowAnalysis{
>>> >
>>> > public:
>>> >
>>> > usingBoundedJoinSemiLattice=…;
>>> >
>>> > BoundedJoinSemiLatticeInitialElement();
>>> >
>>> > BoundedJoinSemiLatticeTransferStmt(
>>> >
>>> > constclang::Stmt*stmt,constBoundedJoinSemiLattice&element,
>>> >
>>> > clang::dataflow::Environment&environment);
>>> >
>>> > };
>>> >
>>> >
>>> > Environment: A Built-in Lattice
>>> >
>>> > All user-defined operations have access to an environment, which
>>> > encapsulates the program context of whatever program element is being
>>> > considered. It contains:
>>> >
>>> > *
>>> >
>>> > a path condition,
>>> >
>>> > *
>>> >
>>> > a storage model (which models both local variables and the heap).
>>> >
>>> >
>>> > The built-in transfer functions model the effects of C++ language
>>> > constructs (for example, variable declarations, assignments, pointer
>>> > dereferences, arithmetic etc.) by manipulating the environment, saving
>>> > the user from doing so in custom transfer functions.
>>> >
>>> >
>>> > The path condition accumulates the set of boolean conditions that are
>>> > known to be true on every path from function entry to the current
>>> > program point.
>>> >
>>> >
>>> > The environment maintains the maps from program declarations and
>>> > pointer values to storage locations. It also maintains the map of
>>> > storage locations to abstract values. Storage locations can be atomic
>>> > (“scalar”) or aggregate multiple sublocations:
>>> >
>>> > *
>>> >
>>> > ScalarStorageLocationis a storage location that is not subdivided
>>> > further for the purposes of abstract interpretation, for example
>>> > bool, int*.
>>> >
>>> > *
>>> >
>>> > AggregateStorageLocationis a storage location which can be
>>> > subdivided into smaller storage locations that can be
>>> > independently tracked by abstract interpretation. For example, a
>>> > struct with public members.
>>> >
>>> >
>>> > In addition to this storage structure, our model tracks three kinds of
>>> > values: basic values (like integers and booleans), pointers (which
>>> > reify storage location into the value space) and records with named
>>> > fields, where each field is itself a value. For aggregate values, we
>>> > ensure coherence between the structure represented in storage
>>> > locations and that represented in the values: for a given aggregate
>>> > storage location Laggwith child f:
>>> >
>>> > *
>>> >
>>> > valueAt(Lagg) is a record, and
>>> >
>>> > *
>>> >
>>> > valueAt(child(Lagg, f)) = child(valueAt(Lagg), f),
>>> >
>>> > where valueAtmaps storage locations to values.
>>> >
>>> >
>>> > In our first iteration, the only basic values that we will track are
>>> > boolean values. Additional values and variables may be relevant, but
>>> > only in so far as they constitute part of a boolean expression that we
>>> > care about. For example, if (x > 5) { … }will result in our tracking x
>>> > > 5, but not any particular knowledge about the integer-valued
>>> variable x.
>>> >
>>> >
>>> > The path conditions and the model refer to the same sets of values and
>>> > locations.
>>> >
>>> >
>>> > Accounting for infinite height
>>> >
>>> > Despite the simplicity of our storage model, its elements can grow
>>> > infinitely large: for example, path conditions on loop back edges,
>>> > records when modeling linked lists, and even just the number of
>>> > locations for a simple for-loop. Therefore, we provide the user with
>>> > two opportunities to interpret the abstract values so as to bound it:
>>> > comparison of abstract values and a merge operation at control-flow
>>> joins.
>>> >
>>> >
>>> > // Compares values from two different environments for semantic
>>> > equivalence.
>>> >
>>> > boolCompare(
>>> >
>>> > clang::dataflow::Value*value1,clang::dataflow::Environment&env1,
>>> >
>>> > clang::dataflow::Value*value2,clang::dataflow::Environment&env2);
>>> >
>>> >
>>> > // Given `value1` (w.r.t. `env1`) and `value2` (w.r.t. `env2`),
>>> returns a
>>> >
>>> > // `Value` (w.r.t. `merged_env`) that approximates `value1` and
>>> > `value2`. This
>>> >
>>> > // could be a strict lattice join, or a more general widening
>>> operation.
>>> >
>>> > clang::dataflow::Value*Merge(
>>> >
>>> > clang::dataflow::Value*value1,clang::dataflow::Environment&env1,
>>> >
>>> > clang::dataflow::Value*value2,clang::dataflow::Environment&env2,
>>> >
>>> > clang::dataflow::Environment&merged_env);
>>> >
>>> >
>>> > Verification by SAT Solver
>>> >
>>> > The path conditions are critical in refining our approximation of the
>>> > local variables and heap. At an abstract level, they don’t need to
>>> > directly influence the execution of the dataflow analysis. Instead, we
>>> > can imagine that, once concluded, our results are annotated
>>> > with/incorporate path conditions, which in turn allows us to make more
>>> > precise conclusions about our program from the analysis results.
>>> >
>>> >
>>> > For example, if the analysis concludes that a variable conditionally
>>> > holds a non-null pointer, and, for a given program point, the path
>>> > condition implies said condition, we can conclude that a dereference
>>> > of the pointer at that point is safe. In this sense, the verification
>>> > of the safety of a deference is separate from the analysis, which
>>> > models memory states with formulae. That said, invocation of the
>>> > solver need not wait until after the analysis has concluded. It can
>>> > also be used during the analysis, for example, in defining the
>>> > widening operation in Merge.
>>> >
>>> >
>>> > With that, we can see the role of an SAT solver: to solve questions of
>>> > the form “Does the path condition satisfy the pre-condition computed
>>> > for this pointer variable at this program point”? To that end, we
>>> > include an API for asking such questions of a SAT solver. The API is
>>> > compatible with Z3 and we also include our own (simple) SAT solver,
>>> > for users who may not want to integrate with Z3.
>>> >
>>> >
>>> > Testing dataflow analysis
>>> >
>>> > We provide test infrastructure that allows users to easily write tests
>>> > that make assertions about a program state computed by dataflow
>>> > analysis at any code point. Code is annotated with labels on points of
>>> > interest and then test expectations are written with respect to the
>>> > labeled points.
>>> >
>>> >
>>> > For example, consider an analysis that computes the state of
>>> > optionalvalues.
>>> >
>>> > ExpectDataflowAnalysis<OptionalChecker>(R"(
>>> >
>>> > void f(std::optional<int> opt) {
>>> >
>>> > if (opt.has_value()) {
>>> >
>>> > // [[true-branch]]
>>> >
>>> > } else {
>>> >
>>> > // [[false-branch]]
>>> >
>>> > }
>>> >
>>> > })",[](constauto&results){
>>> >
>>> >
>>> EXPECT_LE(results.ValueAtCodePoint("true-branch"),OptionalLattice::Engaged());
>>> >
>>> >
>>> EXPECT_LE(results.ValueAtCodePoint("false-branch"),OptionalLattice::NullOpt());
>>> >
>>> > });
>>> >
>>> >
>>> > Timeline
>>> >
>>> > We have a fully working implementation of the framework as proposed
>>> > here. We also are interested in contributing a clang-tidy checker for
>>> > std::optionaland related types as soon as possible, given its
>>> > demonstrated efficacy and interest from third parties. To that end,
>>> > as soon as we have consensus for starting this work in the clang
>>> > repository, we plan to start sending patches for review. Given that
>>> > the framework is relatively small, we expect it will fit within 5
>>> > (largish) patches, meaning we can hope to have it available by
>>> > December 2021.
>>> >
>>> >
>>> > Future Work
>>> >
>>> > We consider our first version of the framework as experimental. It
>>> > suits the needs of a number of particular checks that we’ve been
>>> > writing, but it has some obvious limitations with respect to general
>>> > use. After we commit our first version, here are some planned areas
>>> > of improvement and investigation.
>>> >
>>> >
>>> > Backwards analyses.The framework only solves forward dataflow
>>> > equations. This clearly prevents implementing some very familiar and
>>> > useful analyses, like variable liveness. We intend to expand the
>>> > framework to also support backwards dataflow equations. There’s
>>> > nothing inherent to our framework that limits it to forwards.
>>> >
>>> >
>>> > Reasoning about non-boolean values.Currently, we can symbolically
>>> > model equality of booleans, but nothing else. Yet, for locations on
>>> > the store that haven’t changed, we can understand which values are
>>> > identical (because the input value object is passed by the transfer
>>> > function to the output). For example,
>>> >
>>> > voidf(bool*a,bool*b){
>>> >
>>> > std::optional<int>opt;
>>> >
>>> > if(a ==b){opt =42;}
>>> >
>>> > if(a ==b){opt.value();}
>>> >
>>> > }
>>> >
>>> > Since neither anor bhave changed, we can reason that the two
>>> > occurrences of the expression a == bevaluate to the same value (even
>>> > though we don't know the exact value). We intend to improve the
>>> > framework to support this increased precision, given that it will not
>>> > cost us any additional complexity.
>>> >
>>> >
>>> > Improved model of program state.A major obstacle to writing
>>> > high-quality checks for C++ code is the complexity of C++’s
>>> > semantics. In writing our framework, we’ve come to see the value of
>>> > shielding the user from the need to handle these directly by modeling
>>> > program state and letting the user express their check in terms of
>>> > that model, after it has been solved for each program point. As is,
>>> > our model currently supports a number of checks, including the
>>> > std::optionalcheck and another that infers which pointer parameters
>>> > are out parameters. We intend to expand the set of such checks, with
>>> > improvements to the model and the API for expressing domain-specific
>>> > semantics relevant to a check. Additionally, we are considering
>>> > explicitly tracking other abstract program properties, like reads and
>>> > writes to local variables and memory, so that users need not account
>>> > for all the syntactic variants of these operations.
>>> >
>>> >
>>> > Refactoring existing checks.Once the framework is more mature and
>>> > stable, we would like to refactor existing checks that use custom,
>>> > ad-hoc implementations of dataflow to use this framework.
>>> >
>>> >
>>> > Conclusion
>>> >
>>> > We have proposed an addition to the Clang repository of a new static
>>> > analysis framework targeting all-pathsanalyses. The intended clients
>>> > of such analyses are code-checking and transformation tools like
>>> > clang-tidy, where sound analyses can support sound transformations of
>>> > code. We discussed the key features and APIs of our framework and how
>>> > they fit together into a new analysis framework. Moreover, our
>>> > proposal is based on experience with a working implementation that
>>> > supports a number of practical, high-precision analyses, including one
>>> > for safe use of types like std::optionaland another that finds missed
>>> > opportunities for use of std::move.
>>> >
>>> >
>>> >
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20211012/e826d09a/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4000 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20211012/e826d09a/attachment-0001.bin>
More information about the cfe-dev
mailing list