[PATCH] D53024: [analyzer][www] Add more open projects

Artem Dergachev via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Oct 11 15:45:31 PDT 2018


NoQ added a comment.

> In addition 2018 Bay Area LLVM Developers' Meetings may bring some new open projects :)

Actually, let's commit this before the conference, even if it's not perfect, so that people who suddenly get inspired to work on Static Analyzer already had an updated list :)



================
Comment at: www/analyzer/open_projects.html:29
+    <li>Implement a dataflow flamework.
+    <p><!-- TODO: Explain. -->
+    <i> (Difficulty: Hard) </i></p>
----------------
@george.karpenkov your turn here.


================
Comment at: www/analyzer/open_projects.html:33
+
+    <li>Handle aggregate construction.
+    <p>Aggregates are object that can be brace-initialized without calling a
----------------
Actually, maybe let's turn this into a single project with sub-bullets and list all problems instead of just one.

Ok if i just dump my thoughts with markdown? >_<

Something like:
* Handle more ways of constructing C++ objects by identifying `ConstructionContext`s in the `CFG` and using them to identify and track the object until the permanent storage for the object is evaluated.
  * Constructors of fields and [[since C++17] base classes] of aggregates. When aggregates are brace-initialized, their own trivial constructor does not happen, but the constructor for sub-objects does. It would be necessary to see beyond the `InitListExpr` in order to find out which field of which aggregate is actually constructed. Because brace initializers can be nested, `ConstructionContext` for this constructor would potentially contain an indefinite amount of intermediate `InitListExpr`s. Additionally, if the field is of reference type, lifetime extension needs to be modeled.
  * Constructors within `new[]`. Once an array of objects is allocated via `operator new[]`, constructors for all elements of the array are called. Arguably, we should evaluate at least a few of them, as if it was some sort of loop. The same applies to destructors within `delete[]`. `ConstructionContext` is already available here, so it is likely that there's no CFG work required, though indicating the presence of a loopy control flow in the CFG might be helpful.
  * Constructors that can be elided due to Named Return Value Optimization (NRVO). If a local variable within a function is of object type and is returned by value on all return statements, then the compiler is allowed (but not required to, even in C++17) to immediately store this variable at the address into which the function would put the return value, thereby skipping ("eliding") copy/move constructor call (even if it has user-visible side effects). Variables eligible for NRVO can be easily identified in the AST via `VarDecl::isNRVOVariable()`, so no CFG work is necessary here, but Static Analyzer needs to realize that the respecitive `VarRegion` should be entirely replaced with the return region of the function for the whole duration of the respective stack frame.
  * Constructors of virtual method arguments. Constructors of function arguments are already supported, but Static Analyzer has problems finding the correct parameter variable declaration and the correct stack frame for the object under construction if it doesn't know how exactly a virtual call is devirtualized. It might help to simplify the identity of the parameter region to exclude the `Decl` of the callee and its parameters from its identity.
  * Constructors of default arguments. In C++ functions can have default values for parameters that are re-computed at run-time every time the the function is called without specifying the argument explicitly. There isn't much difference between default arguments and normal arguments when it comes to `ConstructionContext`-related problems. But the actual problem here is that the expression that initializes the parameter is not part of the current stack frame, but instead lives in the middle of nowhere within the parameter declaration. This means that if two calls of the same function with defaulted arguments are present within the same full-expression, we may accidentally assign two different values (corresponding to two different default argument objects) to the same expression in the Environment, which is incorrect. One of the possible solutions is to define an additional `LocationContext` to discriminate between different default argument evaluations, similarly to how `StackFrameContext` allows discriminating between values of the same expression on different layers of recursion.
  * Constructors that perform lambda captures. If a lambda captures a variable of object type by value, the object needs to be copied into the lambda, which implies calling a copy-constructor. A new kind of `ConstructionContext` would need to be defined in order to identify the memory region occupied by the lambda object (most likely a `MaterializeTemporaryExpr` that surrounds the `LambdaExpr`) and the sub-region that corresponds to the implicit field of the lambda object that contains the capture.

(Difficulty: Medium)


================
Comment at: www/analyzer/open_projects.html:69
+    analyzer that calling <code>.length()</code> on an empty <code>std::string
+    </code> will yield 0, and vice versa, but supporting all of them is a huge
+    amount of work. One good thing to start with here would be to notice that
----------------
"supporting all of them" -> "modeling all methods in all classes of the standard library"


================
Comment at: www/analyzer/open_projects.html:70
+    </code> will yield 0, and vice versa, but supporting all of them is a huge
+    amount of work. One good thing to start with here would be to notice that
+    inlining methods of C++ "containers" is currently outright forbidden in
----------------
Maybe move "One good thing to start..." into a sub-bullet?


================
Comment at: www/analyzer/open_projects.html:86-87
+    the opposite direction - integers to pointers - are completely unsupported.
+    Pointer-to-pointer casts are a mess; modeling them with <code><a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1ElementRegion.html">
+    ElementRegion </a></code> is a impractical, and we are suffering a lot from
+    this hack, but coming up with a significantly better solution is very hard,
----------------
"Pointer-to-pointer casts are also causing a surprising amount of problems. The decision to re-use `ElementRegion` to track the type of the object behind the pointer value seems to create more problems than it solves, but..."


================
Comment at: www/analyzer/open_projects.html:109-116
     <li>Explicitly model standard library functions with <tt>BodyFarm</tt>.
     <p><tt><a href="http://clang.llvm.org/doxygen/classclang_1_1BodyFarm.html">BodyFarm</a></tt> 
     allows the analyzer to explicitly model functions whose definitions are 
     not available during analysis. Modeling more of the widely used functions 
     (such as the members of <tt>std::string</tt>) will improve precision of the
     analysis. 
+    <i>(Difficulty: Easy)</i><p>
----------------
Let's move this into a sub-bullet of "Enhance the modeling of the standard library." and remove the part about `std::string`, but instead say "This is often effective for simple C functions such as atomic builtins and certain related threading functions (such as `call_once`) because atomicity doesn't have any meaning for Static Analyzer and can be completely ignored. This approach is not recommended for functions dealing with complicated C++ code - temporaries, templates - unless you have unbelievably good AST handwriting and recite all supported versions of the C++ standard by heart."


================
Comment at: www/analyzer/open_projects.html:144
 
     <li>Enhance CFG to model C++ <code>delete</code> more precisely.
     <p>Similarly, the representation of <code>delete</code> does not include
----------------
Let's move this into the construction context bullet. I think we do call the destructor now but it might be that we don't call the custom delete operator.


================
Comment at: www/analyzer/open_projects.html:177
+
+    <li>Make the values in <code><a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1Environment.html">
+    Environment</a></code> immutable
----------------
Let's mark this with "Refactoring: ".


================
Comment at: www/analyzer/open_projects.html:308
 
     <li>Implement a BitwiseMaskingChecker to handle <a href="http://llvm.org/bugs/show_bug.cgi?id=16615">PR16615</a>.
+    <p>Symbolic expressions of the form <code>$sym & CONSTANT</code> can
----------------
This should be removed because we already have a bullet for this.


https://reviews.llvm.org/D53024





More information about the cfe-commits mailing list