[PATCH] D71734: [ODRHash] Hash `RecordDecl` and diagnose discovered mismatches.

Tue Jan 17 20:10:06 PST 2023

vsapsai added inline comments.

================
Comment at: clang/lib/AST/Decl.cpp:4881-4882
+  // For RecordDecl the ODRHash is stored in the remaining 26
+  // bit of RecordDeclBits, adjust the hash to accomodate.
+  setODRHash(Hash.CalculateHash() >> 6);
+  return RecordDeclBits.ODRHash;
----------------
ChuanqiXu wrote:
> I am curious for the operation. It looks dangerous. How can we be sure that the hash value is still meaningful after remove its lower 6 bits?
Hash value itself has no meaning. The risk with truncation is that `RecordDecl`s with different hashes can end up with equal truncated hashes. It means we'd miss some mismatches. Currently we are missing //all// mismatches, so some looks like an improvement. 

This applies only to C and Objective-C but not to C++ as CXXRecordDecl stores its own ODR hash separately. So some imperfection in Objective-C seems more desirable than adding 4 bytes in memory consumption per each struct and class, including C++ classes.

================
Comment at: clang/lib/AST/ODRDiagsEmitter.cpp:1550-1551

+bool ODRDiagsEmitter::diagnoseMismatch(const RecordDecl *FirstRecord,
+                                       const RecordDecl *SecondRecord) const {
+  if (FirstRecord == SecondRecord)
----------------
ChuanqiXu wrote:
> The implementation of this function looks redundant with other overloads. Of course this is not the problem of the patch. I think we can add a FIXME at least.
Do you have any specific ideas to reduce redundancy? `PopulateHashes` probably can be made the same for different entities but the rest has many annoying differences. Diagnosing mismatches for different entities consists of the same pieces (that are already put into reusable methods) but the composition of such pieces is different for different entities.

I've tried to push complex logic into reusable methods and repeat the simple stuff. For example, for `std::string FirstModule = getOwningModuleNameForDiagnostic(FirstRecord);` obtaining a module name is non-trivial and that's why it is in a reusable method. But the same assignment doesn't seem to carry the problems of copy-pasted code.

By the way, Phabricator shows big [visually] contiguous chunks as being copied. But in fact these big chunks consist of smaller pieces that are taken from different places. So I think that reusing the same pieces multiple times and composing them in different ways is actually useful.

================
Comment at: clang/lib/AST/ODRHash.cpp:592-593

+void ODRHash::AddRecordDecl(const RecordDecl *Record) {
+  AddDecl(Record);
+
----------------
ChuanqiXu wrote:
> For the current implementation, if it makes sense to add an assertion `!isa<CXXRecordDecl>(Record);`
Yes, that's a good idea as calling AddRecordDecl with CXXRecordDecl looks like an easy mistake to make.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71734/new/

https://reviews.llvm.org/D71734