[PATCH] D58749: [index-while-building] IndexRecordHasher

Dmitri Gribenko via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Mar 12 08:48:24 PDT 2019


gribozavr added a comment.

I left some comments, but it is difficult for me to review without understanding what the requirements for this hasher are, why some information is hashed in, and some is left out, could you clarify?  See detailed comments.

> This implementation is covered by lit tests using it through c-index-test in upcoming patch.

This code looks very unit-testable.  It also handles many corner cases (what information is hashed and what is left out).  c-index-tests are integration tests that are not as good at that, and also they would be testing this code quite indirectly.



================
Comment at: clang/lib/Index/IndexRecordHasher.cpp:175
+    // There's a balance between caching results and not growing the cache too
+    // much. Measurements showed that avoiding caching all decls is beneficial
+    // particularly when including all of Cocoa.
----------------
"caching all decls" => "caching hashes for all decls"


================
Comment at: clang/lib/Index/IndexRecordHasher.cpp:191
+
+  auto asCanon = [](QualType Ty) -> CanQualType {
+    return CanQualType::CreateUnsafe(Ty);
----------------
I don't think that hiding a "CreateUnsafe" in an operation with a benign name "asCanon" is a good idea.  Why not inline this lambda everywhere, it looks like it is defined to only save typing a few characters.


================
Comment at: clang/lib/Index/IndexRecordHasher.cpp:207
+    if(qVal)
+      Hash = hash_combine(Hash, qVal);
+
----------------
What about other qualifiers?

Why not use `Qualifiers::getAsOpaqueValue()` instead of manually converting a subset of qualifiers to unsigned?


================
Comment at: clang/lib/Index/IndexRecordHasher.cpp:209
+
+    // Hash in ObjC GC qualifiers?
+
----------------
Is this a FIXME or?..


================
Comment at: clang/lib/Index/IndexRecordHasher.cpp:215
+    if (const PointerType *PT = dyn_cast<PointerType>(T)) {
+      Hash = hash_combine(Hash, '*');
+      CT = asCanon(PT->getPointeeType());
----------------
Why not hash in `T->getTypeClass()` uniformly for all types, instead of inventing a random sigil?


================
Comment at: clang/lib/Index/IndexRecordHasher.cpp:219
+    }
+    if (const ReferenceType *RT = dyn_cast<ReferenceType>(T)) {
+      Hash = hash_combine(Hash, '&');
----------------
There is LValueReferenceType and RValueReferenceType, do we care about the difference?  If not, why not?


================
Comment at: clang/lib/Index/IndexRecordHasher.cpp:254
+
+    break;
+  }
----------------
What about all other types -- array type, function type, decltype(), ...?

What about attributes?


================
Comment at: clang/lib/Index/IndexRecordHasher.cpp:277
+template <typename T>
+hash_code IndexRecordHasher::tryCache(const void *Ptr, T Obj) {
+  auto It = HashByPtr.find(Ptr);
----------------
I had to read the implementation of this function to understand what it does.  How about renaming it to "cachedHash()" ?


================
Comment at: clang/lib/Index/IndexRecordHasher.cpp:291
+
+hash_code IndexRecordHasher::hashImpl(const Decl *D) {
+  return DeclHashVisitor(*this).Visit(D);
----------------
hashImpl => cachedHashImpl


================
Comment at: clang/lib/Index/IndexRecordHasher.cpp:409
+
+  // Unhandled type.
+  return Hash;
----------------
So what is the failure mode for unhandled types, what is the effect on the whole system?


================
Comment at: clang/lib/Index/IndexRecordHasher.cpp:476
+  case NestedNameSpecifier::TypeSpecWithTemplate:
+    // Fall through to hash the type.
+
----------------
I'd suggest to remove this comment.  It is more confusing than helpful.  It makes the code look like there's some processing (like a "break" in other cases), but at a close inspection it turns out to be a comment.  This kind of fallthrough is pretty common and I don't think it requires a comment.  (For example, hashImpl above has this kind of fallthough, but does not have comments like this.)


================
Comment at: clang/lib/Index/IndexRecordHasher.h:31
+/// Implements hashing of AST nodes suitable for the index.
+/// Caching all produced hashes.
+class IndexRecordHasher {
----------------
Could you add some explanation about the information that is being hashed?  What is the underlying principle behind choosing to hash some aspect of the AST node (or to skip it).  For example, I see you're hashing names, but not, say source locations.  Or that QualTypes are first translated into canonical types.

What are the completeness constraints for this hasher?  What happens if we don't hash something that we should have?

"Caching all produced hashes" seems like an implementation comment.  Especially since the implementation chooses not to cache some of the hashes.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D58749/new/

https://reviews.llvm.org/D58749





More information about the cfe-commits mailing list