[PATCH] D91948: [WIP][analyzer][doc] Add Container- and IteratorModeling developer docs

Whisperity via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Nov 23 04:00:05 PST 2020


whisperity added a comment.

In general, perhaps you should go over the rendered text once again and make use of //emph// and **bold** some more.
Explanation looks okay from my part, although I'm really not knowledgeable about CSA internals. But there are plenty of "typesetting" issues.

Do you have to stick to 80-col in the documentation file? It is customary (at least in LaTeX) that prose is organised in a "one line of code per sentence" fashion. This also makes handling the diff easier... In the latter part towards the discussion about how iterators are targeted, the "80-col" is broken anyways.



================
Comment at: clang/docs/analyzer/developer-docs.rst:14
    developer-docs/RegionStore
-   
\ No newline at end of file
+   developer-docs/ContainerModeling.rst
+   
----------------
How come this new addition is a `.rst` but the rest isn't. Does the rendering work without it? We should keep a style here, either way. So turn all into proper `.rst` links, or drop the `.rst` here.


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:5
+
+The goal of checker ``alpha.cplusplus.ContainerModeling`` is to provide a
+symbolic abstraction for containers to the Clang Static Analyzer. There are
----------------
Can the checker's name be turned into a link, or am I mixing up individual checker documentations' availability with Tidy?

Either way, show the check's name in bold face.


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:6
+The goal of checker ``alpha.cplusplus.ContainerModeling`` is to provide a
+symbolic abstraction for containers to the Clang Static Analyzer. There are
+various concepts regarding containers that help formulate static analysis
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:8
+various concepts regarding containers that help formulate static analysis
+problems more concisely. The size of the container, whether is empty or not are
+the most trivial motivating examples. Standard containers can be iterated, and
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:11
+this idiom is well-adopted in case of non-standard container implementations as
+well, because it can be used to provide a compatible interface to algorithms.
+Therefore iterator modeling is closely related to containers. Iterators extend
----------------
what does `it` refer to here?


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:21
+the modeling checkers mentioned:
+  * :ref:`alpha-cplusplus-InvalidatedIterator`
+  * :ref:`alpha-cplusplus-IteratorRange`
----------------
Will this appear to the reader as `alpha-` or `alpha.`?


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:29
+
+According to ContainerModeling, a value ``c`` with type ``C`` is considered a
+container if either of the following holds:
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:35
+    This should be detected by type C having member functions ``T C::begin()``
+    and ``T C::end()``, where T is an `iterator
+    <https://en.cppreference.com/w/cpp/iterator#Iterator_categories>`_.
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:35-36
+    This should be detected by type C having member functions ``T C::begin()``
+    and ``T C::end()``, where T is an `iterator
+    <https://en.cppreference.com/w/cpp/iterator#Iterator_categories>`_.
+  * The expression ``begin(c)`` and ``end(c)`` are both valid expressions in a
----------------
whisperity wrote:
> 
Iterator was linked already, I don't think this needs to repeat.


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:37-47
+  * The expression ``begin(c)`` and ``end(c)`` are both valid expressions in a
+    given scope, and return an `iterator
+    <https://en.cppreference.com/w/cpp/iterator#Iterator_categories>`_.
+    This should be detected by checking the existence functions with the
+    corresponding names, and can either be user-defined free functions or
+    template specialization of the standard-defined ``template<typename T>
+    constexpr auto std::begin(T& t) -> decltype(t.begin())`` and
----------------
What about [[ https://en.cppreference.com/w/cpp/algorithm/ranges/for_each | neibloids (see right before Parameters) ]]  (callables that are explicitly disabling ADL at a call site even though they look like an ADL call, with tricks, usually found in the `ranges` library).


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:40
+    <https://en.cppreference.com/w/cpp/iterator#Iterator_categories>`_.
+    This should be detected by checking the existence functions with the
+    corresponding names, and can either be user-defined free functions or
----------------
Existence function?


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:51-55
+ - std::array
+ - std::vector
+ - std::deque
+ - std::list
+ - std::forward_list
----------------
Format these as `monospace` too.


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:57
+
+Example of a custom container with member functions.
+
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:71
+
+Example of a custom container with free functions.
+
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:87
+
+Example of a custom container with std template specialization.
+
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:106
+A container is modeled if it has an associated ``MemRegion``, and this ``MemRegion``,
+or rather the ``const MemRegion*`` (and pointers to its subclasses), that is accessible
+by the ``MemRegionManager`` is what uniquely identifies a container. Temporary
----------------
(While I don't at all agree with the whole //put the `*` to the variable name, not the type name// yadda, Clang prints type names with a ` ` before the `*`, so let's keep it consistent...ly bad.)


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:110
+
+A container is tracked from the ``ProgramPoint``, where either ``begin`` or ``end``
+member function (or free function) is called. Abstract modeling uses ``SymbolRef``-s for the
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:114
+in form of assumptions (inside ``ConstraintManager``).
+For specifying positions inside the container we use one of the following expressions
+  - ``<begin-symbol> + <concrete-value>`` for specifying a position relative to the beginning of the container.
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:122
+
+Containers are modeled in the GDM by their region (MemRegion*) as their associated key,
+this region is immutable, it cannot change during the lifetime of the modeled object.
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:123
+Containers are modeled in the GDM by their region (MemRegion*) as their associated key,
+this region is immutable, it cannot change during the lifetime of the modeled object.
+The begin and end symbols are conjured and are completely unrelated to the region of
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:126
+the container. For each region we store the only the begin and end symbols, other properties
+are to be computed from these, and their relationships stored in the ContstraintManager.
+
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:147
+container modeling. The problem of RValue/LValue (more precisely prvalue, xvalue, and
+lvalue see `value categories <https://en.cppreference.com/w/cpp/language/value_category>`_)
+modeling is not prominent is case containers alone, as no temporary objects are considered.
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:148
+lvalue see `value categories <https://en.cppreference.com/w/cpp/language/value_category>`_)
+modeling is not prominent is case containers alone, as no temporary objects are considered.
+However, this is an issue to be solved when it comes to modeling iterators.
----------------
I think something is missing for here, I am incapable of decyphering this sentence.


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:154-155
+  modeling of copy constructors are needed.
+  No constructors of containers are modeled, there is a WIP
+  `patch <https://reviews.llvm.org/D87388>`_ for default constructor.
+
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:157
+
+There is a limitation in the size of Symbols handled by ``the ConstraintManager``, namely that
+every offset is assumed to be at most ``typesize/4`` in size, otherwise the ``ConstraintManager``
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:158
+There is a limitation in the size of Symbols handled by ``the ConstraintManager``, namely that
+every offset is assumed to be at most ``typesize/4`` in size, otherwise the ``ConstraintManager``
+could not reorder expressions containing the the symbol. As an orthogonal issue symbol-symbol
----------------
Is `typesize` a defined thing in the constraint manager? Otherwise, wouldn't `sizeof(T) / 4` be better readable?


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:159
+every offset is assumed to be at most ``typesize/4`` in size, otherwise the ``ConstraintManager``
+could not reorder expressions containing the the symbol. As an orthogonal issue symbol-symbol
+comparisons still cannot be handled properly if the ``ContstraintManager`` would also be able
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:161-162
+comparisons still cannot be handled properly if the ``ContstraintManager`` would also be able
+answer questions like: is symbol A less than symbol B (instead of just reporting the possible
+range of the values a symbol can have).
+
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:165-166
+.. note::
+  There is a WIP extension: if range of 2 symbols is disjunct and the max of first is less than
+  the min of the second, report less relation. `patch <https://reviews.llvm.org/D77792>`_
+  This patch would be needed to compare the sizes of containers.
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:169
+  If the containers don't overlap in memory, then this would provide a way to determine the size
+  differences. E.g.: If we could store ``a = b + 5`` even if the ranges of a and b is unknown,
+  reordering of this would produce: ``a - b = 5`` and this can have a range attached in the
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:180
+
+A value with type T is possibly considered an iterator:
+If T is
----------------
(same with T -> `T` below consistently)


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:185
+  - destructible
+  - can be incremented (both post and prefix unary plus-plus operator are defined)
+AND
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:204
+The tracking of an iterator begins if a value is detected with the preceding properties *and* its name
+has 'iterator'/'iter'/'it' postfix. In special cases pointers are also treated as iterators, namely,
+if they are results of ``begin`` or ``end`` member functions or free functions.
----------------
What does the `'` do here?


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:210
+ - only track an iterator if its generating expression is a function call which has at least 1 argument, that is an already tracked iterator (and the first iterator parameter's container will be the parent container of the returned iterator)
+If either of these heuristics matches the tracking of iterator should be skipped.
+
----------------
Should be?
Is.


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:213
+Iterators are modeled in the GDM with 2 kinds of keys:
+  - Region (``const MemRegion*``)
+  - Symbol (``SymbolRef``)
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:224
+
+  - a single conjured symbol (SymbolVal)
+  - a conjured symbol (SymbolVal) + a number (``ConcreteInt``) (This for is useful for reordering)
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:227
+
+Functions like find (when alpha.cplusplus.STLAlgorithmModeling is enabled) handle cases where an
+element is found, and a case where it is not
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:236
+
+Example of a pointer iterator implementation (conceptionally no difference between inline and non-inline modes)
+
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:261-282
+    Cont c;              // no modeling should be done here
+
+    int* it = c.begin(); // container-modeling begins by tracking c as a container of type Cont
+                         // begin() member function call triggers the modeling
+                         // iterator-modeling also begins by tracking the value of c.begin()
+                         // as an iterator
+                         // we check if the value has the necessary iterator-properties
----------------
Perhaps it would be better if indivudal comments for each line would simply be on the line before the code?

Something like:

```
// No modelling needed:
Container C;

// Bla bla
foo()
```


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:263
+
+    int* it = c.begin(); // container-modeling begins by tracking c as a container of type Cont
+                         // begin() member function call triggers the modeling
----------------
Yeaaaaaaah...


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:290
+
+Contrary to the container-modeling, not only lvalue iterators are tracked. This is the reason
+why 2 different keys are used in the GDM for iterators. An lvalue iterator has a Region
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:290-293
+Contrary to the container-modeling, not only lvalue iterators are tracked. This is the reason
+why 2 different keys are used in the GDM for iterators. An lvalue iterator has a Region
+(``const MemRegion*``) and it is used if available. If no Region is found for an iterator value
+then a Symbol is used (``SymbolRef``).
----------------
whisperity wrote:
> 
Previously, `lvalue` was written as `LValue`!


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:292
+why 2 different keys are used in the GDM for iterators. An lvalue iterator has a Region
+(``const MemRegion*``) and it is used if available. If no Region is found for an iterator value
+then a Symbol is used (``SymbolRef``).
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:295
+
+In the case of pointer iterators (where std::is_pointer<T>::value is true for the type T of the iterator),
+the modeling of the symbolic value is simpler. The lifetime of such values is simple to model,
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:299
+
+Example of a pointer iterator.
+
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:311
+Iterators implemented as pointer live generally in the SymbolMap (the map containing ``SymbolRef``-s as
+opposed to the map containing the ``const MemRegion*``-s), and can not be only represented with LValues (and
+consequently inside the RegionMap), as tracking the value of symbolic offset of an iterator must handle
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:312
+opposed to the map containing the ``const MemRegion*``-s), and can not be only represented with LValues (and
+consequently inside the RegionMap), as tracking the value of symbolic offset of an iterator must handle
+expressions where there may only be temporaries with no LValues associated with them.
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:315
+
+We cannot consistently track only LValues (``MemRegionVal``-s) or only ``RValues`` (``SymbolVal`` or ``LazyCompoundVal``),
+because pointer iterators have only Rvalues that always identifies them (across multiple subexpressions),
----------------
LValues, `RValues`, incosistency


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:317
+because pointer iterators have only Rvalues that always identifies them (across multiple subexpressions),
+class instance variables only have Lvalues for this role. SymbolMap always has iterator values that are RValues.
+RegionMap can have iterator values which are LVals but also values which are RValues.
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:320
+
+In case of class instance implemented iterators, the operations are ``CXXOperatorCallExpr``-s (not built-in
+operators). Also sometimes RValues of such instances are modeled as ``LazyCompoundVal`` ``SVal``-s, but can
----------------
Could we turn this on its head and simply call these "user-defined iterator types" instead of this, while technically correct, rather wonky "class instance" kind of stuff?


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:324-325
+
+The modeling of special container-related member functions can be found in ``Iterator.cpp``, and
+algorithm modeling in ``STLAlgorithmModeling.cpp``.
+
----------------
Which directory? Previously the algorithm checker was referenced with its logical name `alpha.cplusplus.SomethingSomething`.


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:328-329
+The semantic difference between the 2 iterator implementation with respect to their ``SVals`` is
+that accessing a pointers ``SVal`` always return reference to a Region (no way to be a symbol, SymRegion),
+but in case if a class instance iterator can be a symbol (SymExpr).
+
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:331
+
+Example of a class instance iterator.
+
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:335
+
+  class cont_it {...};
+  cont_it it = cont.begin();
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:340
+
+Example of a container which has iterators as elements.
+
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:360
+
+  **Option 1**: use only RValues (``SymbolRef``-s) as keys
+  Class instance iterators (possibly with ``LazyCompoundVal``) can not be used as keys as they are now, because
----------------
I think LLVM's RST rendering rules have a `.. option::` element, which renders it in a nice style. But this might also be Tidy-specific...


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:363
+  they do not satisfy map key criteria (this could be maybe solved by defining an ordering on them), but the
+  main issue is that even though they wrap a SymbolVal, this wrapping is an implementation detail and should not
+  be relied upon, also meaning they should not be unwrapped.
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:367
+
+  **Option 2**: use only LValues (``const MemRegion*``) as keys
+  Class instance iterators that evaluate as a result of multiple subexpressions have RValues and these immediate
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:369
+  Class instance iterators that evaluate as a result of multiple subexpressions have RValues and these immediate
+  RValues break a cheain of value propagation. lazyCompoundVal should not be used as keys in a map (every
+  operation results in a temporary which can be tracked).
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:369
+  Class instance iterators that evaluate as a result of multiple subexpressions have RValues and these immediate
+  RValues break a cheain of value propagation. lazyCompoundVal should not be used as keys in a map (every
+  operation results in a temporary which can be tracked).
----------------
whisperity wrote:
> 
typo


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:372
+  Pointer iterators that evaluate as a result of multiple subexpressions have RValues and these immediate RValues break
+  the chain of value propagation. lazyCompoundVal should not be used as keys in a map (every operation results in a
+  temporary which can be tracked). no temporaries are created during the evaluation of expressions (i + 1 + 2) there
----------------



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:373-374
+  the chain of value propagation. lazyCompoundVal should not be used as keys in a map (every operation results in a
+  temporary which can be tracked). no temporaries are created during the evaluation of expressions (i + 1 + 2) there
+  is no intermediate lvalue for i + 1.
+
----------------



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91948/new/

https://reviews.llvm.org/D91948



More information about the cfe-commits mailing list