[PATCH] D91948: [WIP][analyzer][doc] Add Container- and IteratorModeling developer docs

Kristóf Umann via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Feb 18 08:15:49 PST 2021


Szelethus requested changes to this revision.
Szelethus added a comment.
This revision now requires changes to proceed.

As far as content goes, here are my thoughts:

There is a huge block of comments in the source code: https://github.com/llvm/llvm-project/blob/main/clang/lib/StaticAnalyzer/Checkers/IteratorModeling.cpp#L13. You have not mentioned these iterator categories: are they still a thing? If so, lets move those comments here, if not, just remove them altogether.

Please mention these docs on the top of iterator checker files.

Everything else:

In general, you have some meaningless sentences that I always catch myself writing as well: "There are a variety of techniques.". Its better to just list them straight away.



================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:105-106
+
+A container is modeled if it has an associated ``MemRegion``, and this ``MemRegion``,
+or rather the ``const MemRegion*`` (and pointers to its subclasses), that is accessible
+by the ``MemRegionManager`` is what uniquely identifies a container. Temporary
----------------
I think `const MemRegion *` is implied.


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:107-108
+or rather the ``const MemRegion*`` (and pointers to its subclasses), that is accessible
+by the ``MemRegionManager`` is what uniquely identifies a container. Temporary
+containers do not necessarily have a ``MemRegion``, these are not modeled.
+
----------------
Fair enough, is this true though for containers whose lifetime was extended? Maybe I should know this myself :^)


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:132-140
+Apart from identifying the container with a ``MemRegion``, in order to interact with
+iterator modeling, the symbolic begin and end positions of the container are also tracked.
+The size (and as a special case, whether the container is empty or not) are properties that
+should also be tracked.
+
+.. note::
+  Currently, the implementation does not handle size and emptiness tracking, but patches
----------------
Huh, interesting, considering the earlier point about how you track iterator positions. Couldn't you just say that `<end symbol> = <begin symbol> + <some concrete value>`? I mean, onbviously not, because if it was that simple it would have landed already, but why is that so difficult? Half a sentence about this should suffice.


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:213-214
+Iterators are modeled in the GDM with 2 kinds of keys:
+  - Region (``const MemRegion*``)
+  - Symbol (``SymbolRef``)
+There are therefore 2 maps which model iterators, one is called the RegionMap, the other is the SymbolMap.
----------------
whisperity wrote:
> 
This is the juicy part. I want some more thoughts in here: why? Can I have some examples?


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:290-313
+Contrary to the container-modeling, not only lvalue iterators are tracked. This is the reason
+why 2 different keys are used in the GDM for iterators. An lvalue iterator has a Region
+(``const MemRegion*``) and it is used if available. If no Region is found for an iterator value
+then a Symbol is used (``SymbolRef``).
+
+In the case of pointer iterators (where std::is_pointer<T>::value is true for the type T of the iterator),
+the modeling of the symbolic value is simpler. The lifetime of such values is simple to model,
----------------
whisperity wrote:
> whisperity wrote:
> > 
> Previously, `lvalue` was written as `LValue`!
Ah, there it is. Move it to where I placed my other inline! :) 


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:301-305
+.. code-block:: cpp
+
+  int * it = cont.begin();
+  int * it = it + 1 + 1;
+  // SVal of it + 1 subexpression: NonLoc kind (designates an RValue)
----------------
This example is here to explain the previous two paragraphs, yet I'm left confused. I heard these things:

"If no Region is found for an iterator value then a Symbol is used (`SymbolRef`)."

Do we have a region here, or not? If we do, then this example doesn't demonstrate what was said highlighted in the first paragraph, so lets do that. There is no region here? Why?

"In the case of pointer iterators, the modeling of the symbolic value is simpler. The lifetime of such values is simple to model,
there is no need for constructors, destructors and copy-elision rules to be taken into consideration."

Do I gather correctly that we need to keep around a memregion and a symbolic value implementation anyways, so we might as well pick the easier one when possible? This look clunky, even if its true. With that said, a paragraph guiding me through the example outlining why its preferable to model `it` as a symbolic value would be appreciated.


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:319
+RegionMap can have iterator values which are LVals but also values which are RValues.
+
+In case of class instance implemented iterators, the operations are ``CXXOperatorCallExpr``-s (not built-in
----------------
You spent long enough on pointer iterators that subsectioning here would be justified. 


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:324-325
+
+The modeling of special container-related member functions can be found in ``Iterator.cpp``, and
+algorithm modeling in ``STLAlgorithmModeling.cpp``.
+
----------------
whisperity wrote:
> Which directory? Previously the algorithm checker was referenced with its logical name `alpha.cplusplus.SomethingSomething`.
This should be moved out of this section. You could mention those files that have the the lowest likelihood of being split up / renamed.


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:327-329
+The semantic difference between the 2 iterator implementation with respect to their ``SVals`` is
+that accessing a pointers ``SVal`` always return reference to a Region (no way to be a symbol, SymRegion),
+but in case if a class instance iterator can be a symbol (SymExpr).
----------------
Why is this an issue? Please explain.


================
Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:383
+  solved in this case (as in Option 1).
+
+
----------------
Any links to a previous discussion?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91948/new/

https://reviews.llvm.org/D91948



More information about the cfe-commits mailing list