[PATCH] D33009: [CodeView] Add an amortized O(1) random access type visitor

Zachary Turner via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue May 9 10:56:56 PDT 2017


zturner created this revision.
Herald added a subscriber: mgorny.

Type streams are notoriously slow to access, because records are variable length, and so given a particular `TypeIndex`, finding the record with the given `TypeIndex` is O(n).  But if we know the offsets of *some* of the elements, we can do better.

The type visitor here provides lazy, random access to an arbitrary type stream, only ever doing as much work as is necessary to find the requested type, and never doing a linear scan more than once for any record, even if it is visited more than once.

To do so, it makes use of a "type index offset array", which is essentially an array of pairs, where the first item is a `TypeIndex`, and the second item is a byte offset into the global type stream.  Such an array is present as part of the serialized format of a PDB file, but in theory we could construct these on the fly for non-serialization purposes as well.

Using this visitor, When a given `TypeIndex` is requested, there are three possible outcomes.

1. The `TypeIndex` has already been requested by the user before.  In this case the absolute offset of this record will be cached.  We just seek to the appropriate offset, deserialize one record, and invoke the user's callback immediately.
2. The `TypeIndex` has not been requested by the user before, but it *has* visited internally while doing a linear scan for a different `TypeIndex` that the user did request.  In this case, we also know the absolute offset of the record (see #3 below for how), and we proceed as in #1
3. Neither 1 or 2 above apply.  In this case we begin by finding the nearest item from the type index offset array that comes **before** the requested item.  This is a log(M) operation where M is the number of known offsets, assuming the entries are roughly evenly spaced.  There is guaranteed to be at least one, because `TypeIndex` 0 has an offset of 0.

Then, we also look for the nearest `TypeIndex` that meets the criteria  in either 1 or 2 above.  Whichever of these two is larger gives us the nearest known offset to the requested index.  From there, we do a linear scan forward to the requested `TypeIndex`, caching offsets along the way as we deserialize each record.  On subsequent attempts to fetch a record with a given `TypeIndex`, any of the records scanned during the execution of this step will have O(1) lookups.

While this process is going on, a `TypeDatabase` that the user supplies is being populated with the various individual records providing quick lookup to type names and data of known records without having to go through the visitor.


https://reviews.llvm.org/D33009

Files:
  llvm/include/llvm/DebugInfo/CodeView/RandomAccessTypeVisitor.h
  llvm/include/llvm/DebugInfo/CodeView/TypeDatabase.h
  llvm/include/llvm/DebugInfo/CodeView/TypeDeserializer.h
  llvm/include/llvm/DebugInfo/CodeView/TypeIndex.h
  llvm/include/llvm/DebugInfo/PDB/Native/RawTypes.h
  llvm/include/llvm/DebugInfo/PDB/Native/TpiStream.h
  llvm/include/llvm/DebugInfo/PDB/Native/TpiStreamBuilder.h
  llvm/include/llvm/Support/BinaryStreamArray.h
  llvm/lib/DebugInfo/CodeView/CMakeLists.txt
  llvm/lib/DebugInfo/CodeView/CVTypeVisitor.cpp
  llvm/lib/DebugInfo/CodeView/RandomAccessTypeVisitor.cpp
  llvm/lib/DebugInfo/CodeView/TypeDatabase.cpp
  llvm/lib/DebugInfo/CodeView/TypeDatabaseVisitor.cpp
  llvm/lib/DebugInfo/PDB/Native/TpiStreamBuilder.cpp
  llvm/tools/llvm-pdbdump/LLVMOutputStyle.cpp
  llvm/unittests/DebugInfo/CMakeLists.txt
  llvm/unittests/DebugInfo/CodeView/CMakeLists.txt
  llvm/unittests/DebugInfo/CodeView/ErrorChecking.h
  llvm/unittests/DebugInfo/CodeView/RandomAccessVisitorTest.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D33009.98319.patch
Type: text/x-patch
Size: 38613 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170509/4560c539/attachment.bin>


More information about the llvm-commits mailing list