[PATCH] D71734: [ODRHash] Hash `RecordDecl` and diagnose discovered mismatches.

Chuanqi Xu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Jan 17 22:48:47 PST 2023


ChuanqiXu accepted this revision.
ChuanqiXu added a comment.
This revision is now accepted and ready to land.

LGTM generally. It'd better to mention this in the `Potentially Breaking Changes` section of ReleaseNotes.



================
Comment at: clang/lib/AST/Decl.cpp:4714
   setIsRandomized(false);
+  RecordDeclBits.ODRHash = 0;
 }
----------------
nit: setODRHash(0) may be more consistent with above style.


================
Comment at: clang/lib/AST/Decl.cpp:4881-4882
+  // For RecordDecl the ODRHash is stored in the remaining 26
+  // bit of RecordDeclBits, adjust the hash to accomodate.
+  setODRHash(Hash.CalculateHash() >> 6);
+  return RecordDeclBits.ODRHash;
----------------
vsapsai wrote:
> ChuanqiXu wrote:
> > I am curious for the operation. It looks dangerous. How can we be sure that the hash value is still meaningful after remove its lower 6 bits?
> Hash value itself has no meaning. The risk with truncation is that `RecordDecl`s with different hashes can end up with equal truncated hashes. It means we'd miss some mismatches. Currently we are missing //all// mismatches, so some looks like an improvement. 
> 
> This applies only to C and Objective-C but not to C++ as CXXRecordDecl stores its own ODR hash separately. So some imperfection in Objective-C seems more desirable than adding 4 bytes in memory consumption per each struct and class, including C++ classes.
It sounds not bad.


================
Comment at: clang/lib/AST/ODRDiagsEmitter.cpp:1550-1551
 
+bool ODRDiagsEmitter::diagnoseMismatch(const RecordDecl *FirstRecord,
+                                       const RecordDecl *SecondRecord) const {
+  if (FirstRecord == SecondRecord)
----------------
vsapsai wrote:
> ChuanqiXu wrote:
> > The implementation of this function looks redundant with other overloads. Of course this is not the problem of the patch. I think we can add a FIXME at least.
> Do you have any specific ideas to reduce redundancy? `PopulateHashes` probably can be made the same for different entities but the rest has many annoying differences. Diagnosing mismatches for different entities consists of the same pieces (that are already put into reusable methods) but the composition of such pieces is different for different entities.
> 
> I've tried to push complex logic into reusable methods and repeat the simple stuff. For example, for `std::string FirstModule = getOwningModuleNameForDiagnostic(FirstRecord);` obtaining a module name is non-trivial and that's why it is in a reusable method. But the same assignment doesn't seem to carry the problems of copy-pasted code.
> 
> By the way, Phabricator shows big [visually] contiguous chunks as being copied. But in fact these big chunks consist of smaller pieces that are taken from different places. So I think that reusing the same pieces multiple times and composing them in different ways is actually useful.
No specific ideas really. I feel it is scary to see so many copied chunks and we should be able to extract them into smaller common functions. But this is not the problem of the patch. I don't want to block it for such reasons.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71734/new/

https://reviews.llvm.org/D71734



More information about the cfe-commits mailing list