[PATCH] D33428: [PDB] Hash types up front when merging types instead of using StringMap

Mon May 22 20:04:29 PDT 2017

zturner added inline comments.

================
Comment at: llvm/lib/DebugInfo/CodeView/TypeSerializer.cpp:22
+struct HashedType {
+  uint64_t Hash;
+  const uint8_t *Data;
----------------
`hash_code`?

================
Comment at: llvm/lib/DebugInfo/CodeView/TypeSerializer.cpp:23-24
+  uint64_t Hash;
+  const uint8_t *Data;
+  unsigned Size; // FIXME: Go to uint16_t?
+  unsigned Index;
----------------
`ArrayRef<uint8_t>` or `StringRef`?

================
Comment at: llvm/lib/DebugInfo/CodeView/TypeSerializer.cpp:25
+  unsigned Size; // FIXME: Go to uint16_t?
+  unsigned Index;
+};
----------------
`TypeIndex`?

================
Comment at: llvm/lib/DebugInfo/CodeView/TypeSerializer.cpp:53-54
+      return LHS == RHS;
+    if (LHS->Hash != RHS->Hash || LHS->Size != RHS->Size)
+      return false;
+    return ::memcmp(LHS->Data, RHS->Data, LHS->Size) == 0;
----------------
I haven't looked at the implementation of `DenseMap`, but is this check necessary?  I would imagine you could assert that the hashes are different, otherwise why would `DenseMap` be calling this function?

================
Comment at: llvm/lib/DebugInfo/CodeView/TypeSerializer.cpp:59-60
+
+/// Private implementation so that we don't leak our DenseMap instantiations to
+/// users.
+class llvm::codeview::TypeHasher {
----------------
Is this an important consideration?  Seems like it sacrifices readability

================
Comment at: llvm/lib/DebugInfo/CodeView/TypeSerializer.cpp:98-99
+  // Compute the hash up front so we can store it in the key.
+  HashedType TempHashedType = {hash_value(Record), Record.data(),
+                               unsigned(Record.size()), NextTypeIndex};
+
----------------
Why not `xxHash64`?  `hash_value` appears to operate on one char at a time.

https://reviews.llvm.org/D33428