[cfe-dev] [libc++] debug mode

Howard Hinnant hhinnant at apple.com
Mon Sep 19 17:36:50 PDT 2011


On Sep 19, 2011, at 7:41 PM, M.E. O'Neill wrote:

> I wrote:
>>> Feel free to suggest alternative designs or variations; my preference though is for schemes where mass iterator invalidation is O(1) (and actually fairly cheap), and the common case (valid iterators) is fast.
> 
> ... and Steven Watanabe replied:
>> mass iterator invalidation is O(1) even if you implement it by marking all valid iterators as invalid.  Recall that in the standard, O(1) means /amortized/ constant time.  The work done invalidating the iterators can always be paired off with the work done creating them in the first place.
> 
> The key with any amortized algorithm is that you shouldn't fall into the trap of repeating work in such a way as to break the amortization.
> 
> Thus, if you pair each invalidation with creation, yes, it's O(1).  But if you have to *check* multiple times whether to invalidate an iterator or not, then you exceed O(1).
> 
> For vector erase, it's not clear to me how you would (at negligible cost) know which of your zillion iterators should be invalidated and which should not.  Maybe Howard's design does this in some clever way?

Howard's not so clever.  Maybe I should try to describe in more detail what I'm doing.  This description should be prefaced with:  nothing is set in stone, and will likely change when I do a node-based container such as list.  But here's what's there right now:

There is a new header: <__debug>, and a new source: debug.cpp.  The source gets built unconditionally.  The header gets implicitly included if -D_LIBCPP_DEBUG2 is on the command line (or equivalent).

The header defines an assert-like macro, _LIBCPP_ASSERT(x, m), which really isn't important for this discussion, but I include it for completeness.  The x is a run time boolean expression and m is a const char* message.

If _LIBCPP_DEBUG2 >= 1, then a "container" is defined.  This is a singleton container with two factory functions to expose it:  const version and non-const version.  The container is hash based, and can be looked up by two keys instead of the traditional one:  By iterator* or by container*.  As we're dealing with several types of containers and iterators, everything that is stored is actually a void*, the address of the iterators and containers.

The API of this container is not stable at this time.  And it is also not a general purpose container.  The API is growing or changing only in direct response to the needs of the debug mode as it is developed for libc++.

Each iterator* is stored in a node that also has a pointer to a container node in the database.  That pointer is null if the iterator does not reference a container (has been invalidated).

Each container* is stored in a node that includes a dynamically sized array (vector-like but not std::vector) of iterator node pointers which contain valid iterators into the referenced container.

Thus you can look up an iterator and find what, if any container it points to.  And you can look up a container and find a list of valid iterators that refer to it.

Iterator constructors create iterator entries in the database.  Iterator destructors remove iterator entries in the database.  Container constructors create container entries in the database and container destructors remove container entries in the database.

The act of removing a container from the database will include locating each valid iterator in the database and invalidating it by zeroing out its container node pointer.  The act of removing an iterator from the database will include removing the reference (if any) of this iterator from a container node in the database.

In a few places the container node in the database has to call back into the container to get the information it needs.  For example even though an iterator is a valid reference into a container, it may not be dereferenceable.  To find out, the container has to be queried.  This involves type-erasing the container when storing it into the database as a void*, and then calling back into the container via a virtual function.  The algorithm for determining whether a valid iterator is dereferenceable is going to be different for each container.

Issues:

*  Iterators must sometimes be invalidated en-mass, e.g. on destruction or buffer reallocation.  And sometimes only certain iterators need to be invalidated, such as those that point beyond end() after an erase().

*  Iterators sometimes need to switch which containers they refer to but otherwise remain valid, such as during a swap, move or splice.

*  For the node-based containers, not all iterators switch containers during a swap, move or splice.  Iterators referring to the end-node always stick with their container.

*  For string, iterators referring into the internal buffer will never switch containers.

Howard




More information about the cfe-dev mailing list