[PATCH] D91948: [WIP][analyzer][doc] Add Container- and IteratorModeling developer docs
Whisperity via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Mon Nov 23 04:00:05 PST 2020
whisperity added a comment.
In general, perhaps you should go over the rendered text once again and make use of //emph// and **bold** some more.
Explanation looks okay from my part, although I'm really not knowledgeable about CSA internals. But there are plenty of "typesetting" issues.
Do you have to stick to 80-col in the documentation file? It is customary (at least in LaTeX) that prose is organised in a "one line of code per sentence" fashion. This also makes handling the diff easier... In the latter part towards the discussion about how iterators are targeted, the "80-col" is broken anyways.
================
Comment at: clang/docs/analyzer/developer-docs.rst:14
developer-docs/RegionStore
-
\ No newline at end of file
+ developer-docs/ContainerModeling.rst
+
----------------
How come this new addition is a `.rst` but the rest isn't. Does the rendering work without it? We should keep a style here, either way. So turn all into proper `.rst` links, or drop the `.rst` here.
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:5
+
+The goal of checker ``alpha.cplusplus.ContainerModeling`` is to provide a
+symbolic abstraction for containers to the Clang Static Analyzer. There are
----------------
Can the checker's name be turned into a link, or am I mixing up individual checker documentations' availability with Tidy?
Either way, show the check's name in bold face.
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:6
+The goal of checker ``alpha.cplusplus.ContainerModeling`` is to provide a
+symbolic abstraction for containers to the Clang Static Analyzer. There are
+various concepts regarding containers that help formulate static analysis
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:8
+various concepts regarding containers that help formulate static analysis
+problems more concisely. The size of the container, whether is empty or not are
+the most trivial motivating examples. Standard containers can be iterated, and
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:11
+this idiom is well-adopted in case of non-standard container implementations as
+well, because it can be used to provide a compatible interface to algorithms.
+Therefore iterator modeling is closely related to containers. Iterators extend
----------------
what does `it` refer to here?
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:21
+the modeling checkers mentioned:
+ * :ref:`alpha-cplusplus-InvalidatedIterator`
+ * :ref:`alpha-cplusplus-IteratorRange`
----------------
Will this appear to the reader as `alpha-` or `alpha.`?
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:29
+
+According to ContainerModeling, a value ``c`` with type ``C`` is considered a
+container if either of the following holds:
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:35
+ This should be detected by type C having member functions ``T C::begin()``
+ and ``T C::end()``, where T is an `iterator
+ <https://en.cppreference.com/w/cpp/iterator#Iterator_categories>`_.
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:35-36
+ This should be detected by type C having member functions ``T C::begin()``
+ and ``T C::end()``, where T is an `iterator
+ <https://en.cppreference.com/w/cpp/iterator#Iterator_categories>`_.
+ * The expression ``begin(c)`` and ``end(c)`` are both valid expressions in a
----------------
whisperity wrote:
>
Iterator was linked already, I don't think this needs to repeat.
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:37-47
+ * The expression ``begin(c)`` and ``end(c)`` are both valid expressions in a
+ given scope, and return an `iterator
+ <https://en.cppreference.com/w/cpp/iterator#Iterator_categories>`_.
+ This should be detected by checking the existence functions with the
+ corresponding names, and can either be user-defined free functions or
+ template specialization of the standard-defined ``template<typename T>
+ constexpr auto std::begin(T& t) -> decltype(t.begin())`` and
----------------
What about [[ https://en.cppreference.com/w/cpp/algorithm/ranges/for_each | neibloids (see right before Parameters) ]] (callables that are explicitly disabling ADL at a call site even though they look like an ADL call, with tricks, usually found in the `ranges` library).
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:40
+ <https://en.cppreference.com/w/cpp/iterator#Iterator_categories>`_.
+ This should be detected by checking the existence functions with the
+ corresponding names, and can either be user-defined free functions or
----------------
Existence function?
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:51-55
+ - std::array
+ - std::vector
+ - std::deque
+ - std::list
+ - std::forward_list
----------------
Format these as `monospace` too.
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:57
+
+Example of a custom container with member functions.
+
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:71
+
+Example of a custom container with free functions.
+
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:87
+
+Example of a custom container with std template specialization.
+
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:106
+A container is modeled if it has an associated ``MemRegion``, and this ``MemRegion``,
+or rather the ``const MemRegion*`` (and pointers to its subclasses), that is accessible
+by the ``MemRegionManager`` is what uniquely identifies a container. Temporary
----------------
(While I don't at all agree with the whole //put the `*` to the variable name, not the type name// yadda, Clang prints type names with a ` ` before the `*`, so let's keep it consistent...ly bad.)
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:110
+
+A container is tracked from the ``ProgramPoint``, where either ``begin`` or ``end``
+member function (or free function) is called. Abstract modeling uses ``SymbolRef``-s for the
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:114
+in form of assumptions (inside ``ConstraintManager``).
+For specifying positions inside the container we use one of the following expressions
+ - ``<begin-symbol> + <concrete-value>`` for specifying a position relative to the beginning of the container.
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:122
+
+Containers are modeled in the GDM by their region (MemRegion*) as their associated key,
+this region is immutable, it cannot change during the lifetime of the modeled object.
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:123
+Containers are modeled in the GDM by their region (MemRegion*) as their associated key,
+this region is immutable, it cannot change during the lifetime of the modeled object.
+The begin and end symbols are conjured and are completely unrelated to the region of
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:126
+the container. For each region we store the only the begin and end symbols, other properties
+are to be computed from these, and their relationships stored in the ContstraintManager.
+
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:147
+container modeling. The problem of RValue/LValue (more precisely prvalue, xvalue, and
+lvalue see `value categories <https://en.cppreference.com/w/cpp/language/value_category>`_)
+modeling is not prominent is case containers alone, as no temporary objects are considered.
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:148
+lvalue see `value categories <https://en.cppreference.com/w/cpp/language/value_category>`_)
+modeling is not prominent is case containers alone, as no temporary objects are considered.
+However, this is an issue to be solved when it comes to modeling iterators.
----------------
I think something is missing for here, I am incapable of decyphering this sentence.
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:154-155
+ modeling of copy constructors are needed.
+ No constructors of containers are modeled, there is a WIP
+ `patch <https://reviews.llvm.org/D87388>`_ for default constructor.
+
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:157
+
+There is a limitation in the size of Symbols handled by ``the ConstraintManager``, namely that
+every offset is assumed to be at most ``typesize/4`` in size, otherwise the ``ConstraintManager``
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:158
+There is a limitation in the size of Symbols handled by ``the ConstraintManager``, namely that
+every offset is assumed to be at most ``typesize/4`` in size, otherwise the ``ConstraintManager``
+could not reorder expressions containing the the symbol. As an orthogonal issue symbol-symbol
----------------
Is `typesize` a defined thing in the constraint manager? Otherwise, wouldn't `sizeof(T) / 4` be better readable?
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:159
+every offset is assumed to be at most ``typesize/4`` in size, otherwise the ``ConstraintManager``
+could not reorder expressions containing the the symbol. As an orthogonal issue symbol-symbol
+comparisons still cannot be handled properly if the ``ContstraintManager`` would also be able
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:161-162
+comparisons still cannot be handled properly if the ``ContstraintManager`` would also be able
+answer questions like: is symbol A less than symbol B (instead of just reporting the possible
+range of the values a symbol can have).
+
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:165-166
+.. note::
+ There is a WIP extension: if range of 2 symbols is disjunct and the max of first is less than
+ the min of the second, report less relation. `patch <https://reviews.llvm.org/D77792>`_
+ This patch would be needed to compare the sizes of containers.
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:169
+ If the containers don't overlap in memory, then this would provide a way to determine the size
+ differences. E.g.: If we could store ``a = b + 5`` even if the ranges of a and b is unknown,
+ reordering of this would produce: ``a - b = 5`` and this can have a range attached in the
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:180
+
+A value with type T is possibly considered an iterator:
+If T is
----------------
(same with T -> `T` below consistently)
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:185
+ - destructible
+ - can be incremented (both post and prefix unary plus-plus operator are defined)
+AND
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:204
+The tracking of an iterator begins if a value is detected with the preceding properties *and* its name
+has 'iterator'/'iter'/'it' postfix. In special cases pointers are also treated as iterators, namely,
+if they are results of ``begin`` or ``end`` member functions or free functions.
----------------
What does the `'` do here?
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:210
+ - only track an iterator if its generating expression is a function call which has at least 1 argument, that is an already tracked iterator (and the first iterator parameter's container will be the parent container of the returned iterator)
+If either of these heuristics matches the tracking of iterator should be skipped.
+
----------------
Should be?
Is.
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:213
+Iterators are modeled in the GDM with 2 kinds of keys:
+ - Region (``const MemRegion*``)
+ - Symbol (``SymbolRef``)
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:224
+
+ - a single conjured symbol (SymbolVal)
+ - a conjured symbol (SymbolVal) + a number (``ConcreteInt``) (This for is useful for reordering)
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:227
+
+Functions like find (when alpha.cplusplus.STLAlgorithmModeling is enabled) handle cases where an
+element is found, and a case where it is not
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:236
+
+Example of a pointer iterator implementation (conceptionally no difference between inline and non-inline modes)
+
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:261-282
+ Cont c; // no modeling should be done here
+
+ int* it = c.begin(); // container-modeling begins by tracking c as a container of type Cont
+ // begin() member function call triggers the modeling
+ // iterator-modeling also begins by tracking the value of c.begin()
+ // as an iterator
+ // we check if the value has the necessary iterator-properties
----------------
Perhaps it would be better if indivudal comments for each line would simply be on the line before the code?
Something like:
```
// No modelling needed:
Container C;
// Bla bla
foo()
```
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:263
+
+ int* it = c.begin(); // container-modeling begins by tracking c as a container of type Cont
+ // begin() member function call triggers the modeling
----------------
Yeaaaaaaah...
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:290
+
+Contrary to the container-modeling, not only lvalue iterators are tracked. This is the reason
+why 2 different keys are used in the GDM for iterators. An lvalue iterator has a Region
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:290-293
+Contrary to the container-modeling, not only lvalue iterators are tracked. This is the reason
+why 2 different keys are used in the GDM for iterators. An lvalue iterator has a Region
+(``const MemRegion*``) and it is used if available. If no Region is found for an iterator value
+then a Symbol is used (``SymbolRef``).
----------------
whisperity wrote:
>
Previously, `lvalue` was written as `LValue`!
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:292
+why 2 different keys are used in the GDM for iterators. An lvalue iterator has a Region
+(``const MemRegion*``) and it is used if available. If no Region is found for an iterator value
+then a Symbol is used (``SymbolRef``).
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:295
+
+In the case of pointer iterators (where std::is_pointer<T>::value is true for the type T of the iterator),
+the modeling of the symbolic value is simpler. The lifetime of such values is simple to model,
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:299
+
+Example of a pointer iterator.
+
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:311
+Iterators implemented as pointer live generally in the SymbolMap (the map containing ``SymbolRef``-s as
+opposed to the map containing the ``const MemRegion*``-s), and can not be only represented with LValues (and
+consequently inside the RegionMap), as tracking the value of symbolic offset of an iterator must handle
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:312
+opposed to the map containing the ``const MemRegion*``-s), and can not be only represented with LValues (and
+consequently inside the RegionMap), as tracking the value of symbolic offset of an iterator must handle
+expressions where there may only be temporaries with no LValues associated with them.
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:315
+
+We cannot consistently track only LValues (``MemRegionVal``-s) or only ``RValues`` (``SymbolVal`` or ``LazyCompoundVal``),
+because pointer iterators have only Rvalues that always identifies them (across multiple subexpressions),
----------------
LValues, `RValues`, incosistency
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:317
+because pointer iterators have only Rvalues that always identifies them (across multiple subexpressions),
+class instance variables only have Lvalues for this role. SymbolMap always has iterator values that are RValues.
+RegionMap can have iterator values which are LVals but also values which are RValues.
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:320
+
+In case of class instance implemented iterators, the operations are ``CXXOperatorCallExpr``-s (not built-in
+operators). Also sometimes RValues of such instances are modeled as ``LazyCompoundVal`` ``SVal``-s, but can
----------------
Could we turn this on its head and simply call these "user-defined iterator types" instead of this, while technically correct, rather wonky "class instance" kind of stuff?
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:324-325
+
+The modeling of special container-related member functions can be found in ``Iterator.cpp``, and
+algorithm modeling in ``STLAlgorithmModeling.cpp``.
+
----------------
Which directory? Previously the algorithm checker was referenced with its logical name `alpha.cplusplus.SomethingSomething`.
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:328-329
+The semantic difference between the 2 iterator implementation with respect to their ``SVals`` is
+that accessing a pointers ``SVal`` always return reference to a Region (no way to be a symbol, SymRegion),
+but in case if a class instance iterator can be a symbol (SymExpr).
+
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:331
+
+Example of a class instance iterator.
+
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:335
+
+ class cont_it {...};
+ cont_it it = cont.begin();
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:340
+
+Example of a container which has iterators as elements.
+
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:360
+
+ **Option 1**: use only RValues (``SymbolRef``-s) as keys
+ Class instance iterators (possibly with ``LazyCompoundVal``) can not be used as keys as they are now, because
----------------
I think LLVM's RST rendering rules have a `.. option::` element, which renders it in a nice style. But this might also be Tidy-specific...
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:363
+ they do not satisfy map key criteria (this could be maybe solved by defining an ordering on them), but the
+ main issue is that even though they wrap a SymbolVal, this wrapping is an implementation detail and should not
+ be relied upon, also meaning they should not be unwrapped.
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:367
+
+ **Option 2**: use only LValues (``const MemRegion*``) as keys
+ Class instance iterators that evaluate as a result of multiple subexpressions have RValues and these immediate
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:369
+ Class instance iterators that evaluate as a result of multiple subexpressions have RValues and these immediate
+ RValues break a cheain of value propagation. lazyCompoundVal should not be used as keys in a map (every
+ operation results in a temporary which can be tracked).
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:369
+ Class instance iterators that evaluate as a result of multiple subexpressions have RValues and these immediate
+ RValues break a cheain of value propagation. lazyCompoundVal should not be used as keys in a map (every
+ operation results in a temporary which can be tracked).
----------------
whisperity wrote:
>
typo
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:372
+ Pointer iterators that evaluate as a result of multiple subexpressions have RValues and these immediate RValues break
+ the chain of value propagation. lazyCompoundVal should not be used as keys in a map (every operation results in a
+ temporary which can be tracked). no temporaries are created during the evaluation of expressions (i + 1 + 2) there
----------------
================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:373-374
+ the chain of value propagation. lazyCompoundVal should not be used as keys in a map (every operation results in a
+ temporary which can be tracked). no temporaries are created during the evaluation of expressions (i + 1 + 2) there
+ is no intermediate lvalue for i + 1.
+
----------------
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D91948/new/
https://reviews.llvm.org/D91948
More information about the cfe-commits
mailing list