r344663 - [analyzer] [www] Updated a list of open projects

Tue Oct 16 18:06:20 PDT 2018

Author: george.karpenkov
Date: Tue Oct 16 18:06:20 2018
New Revision: 344663

URL: http://llvm.org/viewvc/llvm-project?rev=344663&view=rev
Log:
[analyzer] [www] Updated a list of open projects

Differential Revision: https://reviews.llvm.org/D53024

Modified:
    cfe/trunk/www/analyzer/open_projects.html

Modified: cfe/trunk/www/analyzer/open_projects.html
URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/www/analyzer/open_projects.html?rev=344663&r1=344662&r2=344663&view=diff
==============================================================================

--- cfe/trunk/www/analyzer/open_projects.html (original)
+++ cfe/trunk/www/analyzer/open_projects.html Tue Oct 16 18:06:20 2018
@@ -22,162 +22,219 @@ list</a>. If you are interested in tackl
 to the <a href=http://lists.llvm.org/mailman/listinfo/cfe-dev>cfe-dev
 mailing list</a> to notify other members of the community.</p>
 
-<ul>  
-  <li>Core Analyzer Infrastructure
-  <ul>
-    <li>Explicitly model standard library functions with <tt>BodyFarm</tt>.
-    <p><tt><a href="http://clang.llvm.org/doxygen/classclang_1_1BodyFarm.html">BodyFarm</a></tt> 
-    allows the analyzer to explicitly model functions whose definitions are 
-    not available during analysis. Modeling more of the widely used functions 
-    (such as the members of <tt>std::string</tt>) will improve precision of the
-    analysis. 
-    <i>(Difficulty: Easy, ongoing)</i><p>
-    </li>
-
-    <li>Handle floating-point values.
-    <p>Currently, the analyzer treats all floating-point values as unknown.
-    However, we already have most of the infrastructure we need to handle
-    floats: RangeConstraintManager. This would involve adding a new SVal kind
-    for constant floats, generalizing the constraint manager to handle floats
-    and integers equally, and auditing existing code to make sure it doesn't
-    make untoward assumptions.
-    <i> (Difficulty: Medium)</i></p>
-    </li>
-    
-    <li>Implement generalized loop execution modeling.
-    <p>Currently, the analyzer simply unrolls each loop <tt>N</tt> times. This 
-    means that it will not execute any code after the loop if the loop is 
-    guaranteed to execute more than <tt>N</tt> times. This results in lost 
-    basic block coverage. We could continue exploring the path if we could 
-    model a generic <tt>i</tt>-th iteration of a loop.
-    <i> (Difficulty: Hard)</i></p>
-    </li>
-
-    <li>Enhance CFG to model C++ temporaries properly.
-    <p>There is an existing implementation of this, but it's not complete and
-    is disabled in the analyzer.
-    <i>(Difficulty: Medium; current contact: Alex McCarthy)</i></p>    
-
-    <li>Enhance CFG to model exception-handling properly.
-    <p>Currently exceptions are treated as "black holes", and exception-handling
-    control structures are poorly modeled (to be conservative). This could be
-    much improved for both C++ and Objective-C exceptions.
-    <i>(Difficulty: Medium)</i></p>    
-
-    <li>Enhance CFG to model C++ <code>new</code> more precisely.
-    <p>The current representation of <code>new</code> does not provide an easy
-    way for the analyzer to model the call to a memory allocation function
-    (<code>operator new</code>), then initialize the result with a constructor
-    call. The problem is discussed at length in
-    <a href="http://llvm.org/bugs/show_bug.cgi?id=12014">PR12014</a>.
-    <i>(Difficulty: Easy; current contact: Karthik Bhat)</i></p>
-
-    <li>Enhance CFG to model C++ <code>delete</code> more precisely.
-    <p>Similarly, the representation of <code>delete</code> does not include
-    the call to the destructor, followed by the call to the deallocation
-    function (<code>operator delete</code>). One particular issue 
-    (<tt>noreturn</tt> destructors) is discussed in
-    <a href="http://llvm.org/bugs/show_bug.cgi?id=15599">PR15599</a>
-    <i>(Difficulty: Easy; current contact: Karthik Bhat)</i></p>    
-
-    <li>Implement a BitwiseConstraintManager to handle <a href="http://llvm.org/bugs/show_bug.cgi?id=3098">PR3098</a>.
-    <p>Constraints on the bits of an integer are not easily representable as
-    ranges. A bitwise constraint manager would model constraints such as "bit 32
-    is known to be 1". This would help code that made use of bitmasks</code>.
-    <i>(Difficulty: Medium)</i></p>
-    </li>
-
-    <li>Track type info through casts more precisely.
-    <p>The DynamicTypePropagation checker is in charge of inferring a region's
-    dynamic type based on what operations the code is performing. Casts are a
-    rich source of type information that the analyzer currently ignores. They
-    are tricky to get right, but might have very useful consequences.
-    <i>(Difficulty: Medium)</i></p>    
-
-    <li>Design and implement alpha-renaming.
-    <p>Implement unifying two symbolic values along a path after they are 
-    determined to be equal via comparison. This would allow us to reduce the 
-    number of false positives and would be a building step to more advanced 
-    analyses, such as summary-based interprocedural and cross-translation-unit 
-    analysis. 
-    <i>(Difficulty: Hard)</i></p>
-    </li>    
-  </ul>
+<ul>
+  <li>Release checkers from "alpha"
+    <p>New checkers which were contributed to the analyzer,
+    but have not passed a rigorous evaluation process,
+    are committed as "alpha checkers" (from "alpha version"),
+    and are not enabled by default.
+
+    Ideally, only the checkers which are actively being worked on should be in
+    "alpha",
+    but over the years the development of many of those has stalled.
+    Such checkers need a cleanup:
+    checkers which have been there for a long time should either
+    be improved up to a point where they can be enabled by default,
+    or removed, if such an improvement is not possible.
+    Most notably, these checkers could be "graduated" out of alpha
+    if a consistent effort is applied:
+
+    <ul>
+      <li><code>alpha.security.ArrayBound</code> and
+      <code>alpha.security.ArrayBoundV2</code>
+      <p>Array bounds checking is a desired feature,
+      but having an acceptable rate of false positives might not be possible
+      without a proper
+      <a href="https://en.wikipedia.org/wiki/Widening_(computer_science)">loop widening</a> support.
+      Additionally, it might be more promising to perform index checking based on
+      <a href="https://en.wikipedia.org/wiki/Taint_checking">tainted</a> index values.
+      <i>(Difficulty: Medium)</i></p>
+      </li>
+
+      <li><code>alpha.cplusplus.MisusedMovedObject</code>
+        <p>The checker emits a warning on objects which were used after
+        <a href="https://en.cppreference.com/w/cpp/utility/move">move</a>.
+        Currently it has an overly high false positive rate due to classes
+        which have a well-defined semantics for use-after-move.
+        This property does not hold for STL objects, but is often the case
+        for custom containers.
+      <i>(Difficulty: Medium)</i></p>
+      </li>
+
+      <li><code>alpha.unix.StreamChecker</code>
+        <p>A SimpleStreamChecker has been presented in the Building a Checker in 24 
+        Hours talk 
+        (<a href="http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">slides</a>
+        <a href="https://youtu.be/kdxlsP5QVPw">video</a>).</p>
+
+        <p>This alpha checker is an attempt to write a production grade stream checker.
+        However, it was found to have an unacceptably high false positive rate.
+        One of the found problems was that eagerly splitting the state
+        based on whether the system call may fail leads to too many reports.
+        A <em>delayed</em> split where the implication is stored in the state
+        (similarly to nullability implications in <code>TrustNonnullChecker</code>)
+        may produce much better results.</p>
+        <p><i>(Difficulty: Medium)</i></p>
+      </li>
+    </ul>
   </li>
 
-  <li>Bug Reporting 
+  <li>Improved C++ support
   <ul>
-    <li>Refactor path diagnostic generation in <a href="http://clang.llvm.org/doxygen/BugReporter_8cpp_source.html">BugReporter.cpp</a>.
-    <p>It would be great to have more code reuse between "Minimal" and 
-    "Extensive" PathDiagnostic generation algorithms. One idea is to create an 
-    IR for representing path diagnostics, which would be later be used to 
-    generate minimal or extensive report output. <i>(Difficulty: Medium)</i></p>
+    <li>Handle aggregate construction.
+      <p><a href="https://en.cppreference.com/w/cpp/language/aggregate_initialization">Aggregates</a>
+      are objects that can be brace-initialized without calling a
+      constructor (that is, <code><a href="https://clang.llvm.org/doxygen/classclang_1_1CXXConstructExpr.html">
+      CXXConstructExpr</a></code> does not occur in the AST),
+      but potentially calling
+      constructors for their fields and base classes
+      These
+      constructors of sub-objects need to know what object they are constructing.
+      Moreover, if the aggregate contains
+      references, lifetime extension needs to be properly modeled.
+
+      One can start untangling this problem by trying to replace the
+      current ad-hoc <code><a href="https://clang.llvm.org/doxygen/classclang_1_1ParentMap.html">
+      ParentMap</a></code> lookup in <a href="https://clang.llvm.org/doxygen/ExprEngineCXX_8cpp_source.html#l00430">
+      <code>CXXConstructExpr::CK_NonVirtualBase</code></a> branch of
+      <code>ExprEngine::VisitCXXConstructExpr()</code>
+      with proper support for the feature.
+      <i> (Difficulty: Medium) </i></p>
+    </li>
+
+    <li>Handle constructors within <code>new[]</code>
+      <p>When an array of objects is allocated using the <code>operator new[]</code>,
+         constructors for all elements of the array are called.
+         We should model (potentially some of) such evaluations,
+         and the same applies for destructors called from
+         <code>operator delete[]</code>.
+      </p>
+    </li>
+
+    <li>Handle constructors that can be elided due to Named Return Value Optimization (NRVO)
+      <p>Local variables which are returned by values on all return statements
+         may be stored directly at the address for the return value,
+         eliding the copy or move constructor call.
+         Such variables can be identified using the AST call <code>VarDecl::isNRVOVariable</code>.
+      </p>
+    </li>
+
+    <li>Handle constructors of lambda captures
+      <p>Variables which are captured by value into a lambda require a call to
+         a copy constructor.
+         This call is not currently modeled.
+      </p>
+    </li>
+
+    <li>Handle constructors for default arguments 
+      <p>Default arguments in C++ are recomputed at every call,
+         and are therefore local, and not static, variables.
+      </p>
+    </li>
+
+    <li>Enhance the modeling of the standard library.
+      <p>The analyzer needs a better understanding of STL in order to be more
+      useful on C++ codebases.
+      While full library modeling is not an easy task,
+      large gains can be achieved by supporting only a few cases:
+      e.g. calling <code>.length()</code> on an empty
+      <code>std::string</code> always yields zero.
+    <i>(Difficulty: Medium)</i><p>
+    </li>
+
+    <li>Enhance CFG to model exception-handling.
+      <p>Currently exceptions are treated as "black holes", and exception-handling
+      control structures are poorly modeled in order to be conservative.
+      This could be improved for both C++ and Objective-C exceptions.
+      <i>(Difficulty: Medium)</i></p>
     </li>
   </ul>
   </li>
 
-  <li>Other Infrastructure 
+  <li>Core Analyzer Infrastructure
   <ul>
-    <li>Rewrite <tt>scan-build</tt> (in Python).
-    <p><i>(Difficulty: Easy)</i></p>
+    <li>Handle unions.
+      <p>Currently in the analyzer the value of a union is always regarded as
+      an unknown.
+      This problem was
+      previously <a href="http://lists.llvm.org/pipermail/cfe-dev/2017-March/052864.html">discussed</a>
+      on the mailing list, but no solution was implemented.
+      <i> (Difficulty: Medium) </i></p>
+    </li>
+
+    <li>Floating-point support.
+      <p>Currently, the analyzer treats all floating-point values as unknown.
+      This project would involve adding a new <code>SVal</code> kind
+      for constant floats, generalizing the constraint manager to handle floats,
+      and auditing existing code to make sure it doesn't
+      make incorrect assumptions (most notably, that <code>X == X</code>
+      is always true, since it does not hold for <code>NaN</code>).
+      <i> (Difficulty: Medium)</i></p>
+    </li>
+
+    <li>Improved loop execution modeling.
+      <p>The analyzer simply unrolls each loop <tt>N</tt> times before
+      dropping the path, for a fixed constant <tt>N</tt>.
+      However, that results in lost coverage in cases where the loop always
+      executes more than <tt>N</tt> times.
+      A Google Summer Of Code
+      <a href="https://summerofcode.withgoogle.com/archive/2017/projects/6071606019358720/">project</a>
+      was completed to make the loop bound parameterizable,
+      but the <a href="https://en.wikipedia.org/wiki/Widening_(computer_science)">widening</a>
+      problem still remains open.
+
+      <i> (Difficulty: Hard)</i></p>
+    </li>
+
+    <li>Basic function summarization support
+      <p>The analyzer performs inter-procedural analysis using
+      either inlining or "conservative evaluation" (invalidating all data
+      passed to the function).
+      Often, a very simple summary
+      (e.g. "this function is <a href="https://en.wikipedia.org/wiki/Pure_function">pure</a>") would be
+      enough to be a large improvement over conservative evaluation.
+      Such summaries could be obtained either syntactically,
+      or using a dataflow framework.
+      <i>(Difficulty: Hard)</i><p>
+    </li>
+
+    <li>Implement a dataflow flamework.
+      <p>The analyzer core
+      implements a <a href="https://en.wikipedia.org/wiki/Symbolic_execution">symbolic execution</a>
+      engine, which performs checks
+      (use-after-free, uninitialized value read, etc.)
+      over a <em>single</em> program path.
+      However, many useful properties
+      (dead code, check-after-use, etc.) require
+      reasoning over <em>all</em> possible in a program.
+      Such reasoning requires a
+      <a href="https://en.wikipedia.org/wiki/Data-flow_analysis">dataflow analysis</a> framework.
+      Clang already implements
+      a few dataflow analyses (most notably, liveness),
+      but they implemented in an ad-hoc fashion.
+      A proper framework would enable us writing many more useful checkers.
+      <i> (Difficulty: Hard) </i></p>
+    </li>
+
+    <li>Track type information through casts more precisely.
+      <p>The <code>DynamicTypePropagation</code>
+      checker is in charge of inferring a region's
+      dynamic type based on what operations the code is performing.
+      Casts are a rich source of type information that the analyzer currently ignores.
+      <i>(Difficulty: Medium)</i></p>
     </li>
 
-    <li>Do a better job interposing on a compilation.
-    <p>Currently, <tt>scan-build</tt> just sets the <tt>CC</tt> and <tt>CXX</tt>
-    environment variables to its wrapper scripts, which then call into an
-    underlying platform compiler. This is problematic for any project that
-    doesn't exclusively use <tt>CC</tt> and <tt>CXX</tt> to control its
-    compilers.
-    <p><i>(Difficulty: Medium-Hard)</i></p>
-    </li>
-
-    <li>Create an <tt>analyzer_annotate</tt> attribute for the analyzer 
-    annotations.
-    <p>We would like to put all analyzer attributes behind a fence so that we 
-    could add/remove them without worrying that compiler (not analyzer) users 
-    depend on them. Design and implement such a generic analyzer attribute in 
-    the compiler. <i>(Difficulty: Medium)</i></p>
-    </li>
   </ul>
   </li>
 
-  <li>Enhanced Checks
-  <ul>
-    <li>Implement a production-ready StreamChecker.
-    <p>A SimpleStreamChecker has been presented in the Building a Checker in 24 
-    Hours talk 
-    (<a href="http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">slides</a>
-    <a href="https://youtu.be/kdxlsP5QVPw">video</a>).
-    We need to implement a production version of the checker with richer set of 
-    APIs and evaluate it by running on real codebases. 
-    <i>(Difficulty: Easy)</i></p>
-    </li>
-
-    <li>Extend Malloc checker with reasoning about custom allocator, 
-    deallocator, and ownership-transfer functions.
-    <p>This would require extending the MallocPessimistic checker to reason 
-    about annotated functions. It is strongly desired that one would rely on 
-    the <tt>analyzer_annotate</tt> attribute, as described above. 
-    <i>(Difficulty: Easy)</i></p>
-    </li>
-
-    <li>Implement a BitwiseMaskingChecker to handle <a href="http://llvm.org/bugs/show_bug.cgi?id=16615">PR16615</a>.
-    <p>Symbolic expressions of the form <code>$sym & CONSTANT</code> can range from 0 to <code>CONSTANT-</code>1 if CONSTANT is <code>2^n-1</code>, e.g. 0xFF (0b11111111), 0x7F (0b01111111), 0x3 (0b0011), 0xFFFF, etc. Even without handling general bitwise operations on symbols, we can at least bound the value of the resulting expression. Bonus points for handling masks followed by shifts, e.g. <code>($sym & 0b1100) >> 2</code>.
-    <i>(Difficulty: Easy)</i></p>
-    </li>
-
-    <li>Implement iterators invalidation checker.
-    <p><i>(Difficulty: Easy)</i></p>
-    </li>
-    
-    <li>Write checkers which catch Copy and Paste errors.
-    <p>Take a look at the
-    <a href="http://pages.cs.wisc.edu/~shanlu/paper/TSE-CPMiner.pdf">CP-Miner</a>
-    paper for inspiration. 
-    <i>(Difficulty: Medium-Hard; current contacts: Daniel Marjamäki and Daniel Fahlgren)</i></p>
-    </li>  
-  </ul>
+  <li>Fixing miscellaneous bugs
+    <p>Apart from the open projects listed above,
+       contributors are welcome to fix any of the outstanding
+       <a href="https://bugs.llvm.org/buglist.cgi?component=Static%20Analyzer&list_id=147756&product=clang&resolution=---">bugs</a>
+       in the Bugzilla.
+       <i>(Difficulty: Anything)</i></p>
   </li>
+
 </ul>
 
 </div>