[PATCH] D35163: [PDB] Enable NativeSession to create symbols for built-in types on demand

Tue Jul 11 13:34:00 PDT 2017

Whenever the keys and values are small. It's dense in storage, not in the
domain of keys (is my understanding)
On Tue, Jul 11, 2017 at 1:32 PM Zachary Turner <zturner at google.com> wrote:

> The only difference between TPI and IPI is the leaf types that can appear.
> But they both start at kFirstNonSimpleIndex, and IPI records can reference
> TPI records but not the other way around
> On Tue, Jul 11, 2017 at 12:02 PM Adrian McCarthy via Phabricator <
> reviews at reviews.llvm.org> wrote:
>
>> amccarth marked 2 inline comments as done.
>> amccarth added a comment.
>>
>> I've got one more suggestion from Zach to take care of, then I'll upload
>> an updated patch.
>>
>>
>>
>> ================
>> Comment at: llvm/include/llvm/DebugInfo/PDB/Native/NativeSession.h:42
>>
>> +  uint32_t findSymbolByTypeIndex(uint32_t TI);
>> +
>> ----------------
>> zturner wrote:
>> > rnk wrote:
>> > > Any reason not to use codeview::TypeIndex for this parameter? Just
>> trying to avoid exposing internal libraries out through this interface?
>> > +1
>> I tried TypeIndex originally, but there were two issues:
>>
>> 1.  Users of the API generally get the type indexes from symbol methods
>> like getTypeId, which (like DIA), return a uint32_t.  Using a TypeIndex
>> seemed to introduce conversions without much value
>> 2.  There's no standard hash function for a TypeIndex, and defining one
>> seemed gratuitous.
>>
>> I've gone ahead and switched back to TypeIndex.
>>
>>
>> ================
>> Comment at: llvm/lib/DebugInfo/PDB/Native/NativeSession.cpp:44
>> +  PDB_BuiltinType Type;
>> +  uint64_t Size;
>> +} BuiltinTypes[] = {
>> ----------------
>> rnk wrote:
>> > It seems unlikely that a builtin type will grow beyond 4GB.
>> Sure, but, again, this was a matter of conforming to the types used by
>> the already-defined API.  The getLength() method for a symbol returns a
>> uint64_t, so I was trying to avoid unnecessary conversions.  I've switched
>> to uint32_t.
>>
>>
>> ================
>> Comment at: llvm/lib/DebugInfo/PDB/Native/NativeSession.cpp:107-108
>> +  // First see if it's already in our cache.
>> +  const auto Entry = TypeIndexToSymbolId.find(Index);
>> +  if (Entry != TypeIndexToSymbolId.end())
>> +    return Entry->second;
>> ----------------
>> zturner wrote:
>> > zturner wrote:
>> > > I can think of a couple of issues with this function.
>> > >
>> > > 1. I might be misunderstanding the purpose of
>> `findSymbolByTypeIndex`, but doesn't `TypeIndex` imply that this function
>> *only* works for types?  What about symbols, which don't have a type
>> index?  Would those get a different function?
>> > >
>> > > 2. What about id types (e.g. types from the IPI string instead of the
>> TPI stream)?  Would we eventually add a `findSymbolByIdIndex` too?
>> > >
>> > > 3. Why even have a map?  Why not just have a
>> `std::vector<std::unique_ptr<NativeRawSymbol>> Types;` and then in the
>> constructor you resize it to `Tpi.getNumTypeRecords()`, and in this
>> function if it's simple you create one on the fly, otherwise you access
>> `Types[TI.toArrayIndex()]`, and if it's `nullptr` you create a new one
>> otherwise you return the existing one?  This would be O(1) instead of doing
>> the hash computation.
>> > Alternatively, maybe we don't even need a cache?
>> `LazyRandomTypeCollection` already has O(1) amortized random access, as it
>> is effectively already the cache.  So maybe you can create everything on
>> the fly, and not even have a cache?
>> 1.  No, this is specifically for looking up a type symbol by its type
>> index, not the unique symbol index that the API hands out.  The motivation
>> here is, for example, a record for an enum has a method called
>> getUnderlyingType which returns a TypeIndex, not a unique symbol index, so
>> we need to be able to find a type symbol by its type index.
>>
>> 2.  I'm still fuzzy on what the IPI stream contains versus the TPI.  What
>> is an id type?
>>
>> 3.  The symbol cache, which is indexed by symbol IDs is just a vector.
>>
>>
>> ================
>> Comment at: llvm/lib/DebugInfo/PDB/Native/NativeSession.cpp:107-108
>> +  // First see if it's already in our cache.
>> +  const auto Entry = TypeIndexToSymbolId.find(Index);
>> +  if (Entry != TypeIndexToSymbolId.end())
>> +    return Entry->second;
>> ----------------
>> amccarth wrote:
>> > zturner wrote:
>> > > zturner wrote:
>> > > > I can think of a couple of issues with this function.
>> > > >
>> > > > 1. I might be misunderstanding the purpose of
>> `findSymbolByTypeIndex`, but doesn't `TypeIndex` imply that this function
>> *only* works for types?  What about symbols, which don't have a type
>> index?  Would those get a different function?
>> > > >
>> > > > 2. What about id types (e.g. types from the IPI string instead of
>> the TPI stream)?  Would we eventually add a `findSymbolByIdIndex` too?
>> > > >
>> > > > 3. Why even have a map?  Why not just have a
>> `std::vector<std::unique_ptr<NativeRawSymbol>> Types;` and then in the
>> constructor you resize it to `Tpi.getNumTypeRecords()`, and in this
>> function if it's simple you create one on the fly, otherwise you access
>> `Types[TI.toArrayIndex()]`, and if it's `nullptr` you create a new one
>> otherwise you return the existing one?  This would be O(1) instead of doing
>> the hash computation.
>> > > Alternatively, maybe we don't even need a cache?
>> `LazyRandomTypeCollection` already has O(1) amortized random access, as it
>> is effectively already the cache.  So maybe you can create everything on
>> the fly, and not even have a cache?
>> > 1.  No, this is specifically for looking up a type symbol by its type
>> index, not the unique symbol index that the API hands out.  The motivation
>> here is, for example, a record for an enum has a method called
>> getUnderlyingType which returns a TypeIndex, not a unique symbol index, so
>> we need to be able to find a type symbol by its type index.
>> >
>> > 2.  I'm still fuzzy on what the IPI stream contains versus the TPI.
>> What is an id type?
>> >
>> > 3.  The symbol cache, which is indexed by symbol IDs is just a vector.
>> Re:  LazyRandomTypeCollection.  Yes, I think that makes sense to do this
>> for the user-defined types, so that seems like it would replace the need
>> for caching those.  But findSymbolByTypeIndex needs to work for both
>> user-defined types and built-in types.
>>
>> The only trick is that the cached objects are stamped with the symbol's
>> unique ID, so somewhere I'm going to have to keep track of those so that
>> re-visiting a type gets the original ID.
>>
>>
>> https://reviews.llvm.org/D35163
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170711/f7e2961f/attachment.html>