[PATCH] D33428: [PDB] Hash types up front when merging types instead of using StringMap
Zachary Turner via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon May 22 20:04:29 PDT 2017
zturner added inline comments.
================
Comment at: llvm/lib/DebugInfo/CodeView/TypeSerializer.cpp:22
+struct HashedType {
+ uint64_t Hash;
+ const uint8_t *Data;
----------------
`hash_code`?
================
Comment at: llvm/lib/DebugInfo/CodeView/TypeSerializer.cpp:23-24
+ uint64_t Hash;
+ const uint8_t *Data;
+ unsigned Size; // FIXME: Go to uint16_t?
+ unsigned Index;
----------------
`ArrayRef<uint8_t>` or `StringRef`?
================
Comment at: llvm/lib/DebugInfo/CodeView/TypeSerializer.cpp:25
+ unsigned Size; // FIXME: Go to uint16_t?
+ unsigned Index;
+};
----------------
`TypeIndex`?
================
Comment at: llvm/lib/DebugInfo/CodeView/TypeSerializer.cpp:53-54
+ return LHS == RHS;
+ if (LHS->Hash != RHS->Hash || LHS->Size != RHS->Size)
+ return false;
+ return ::memcmp(LHS->Data, RHS->Data, LHS->Size) == 0;
----------------
I haven't looked at the implementation of `DenseMap`, but is this check necessary? I would imagine you could assert that the hashes are different, otherwise why would `DenseMap` be calling this function?
================
Comment at: llvm/lib/DebugInfo/CodeView/TypeSerializer.cpp:59-60
+
+/// Private implementation so that we don't leak our DenseMap instantiations to
+/// users.
+class llvm::codeview::TypeHasher {
----------------
Is this an important consideration? Seems like it sacrifices readability
================
Comment at: llvm/lib/DebugInfo/CodeView/TypeSerializer.cpp:98-99
+ // Compute the hash up front so we can store it in the key.
+ HashedType TempHashedType = {hash_value(Record), Record.data(),
+ unsigned(Record.size()), NextTypeIndex};
+
----------------
Why not `xxHash64`? `hash_value` appears to operate on one char at a time.
https://reviews.llvm.org/D33428
More information about the llvm-commits
mailing list