[PATCH] D80383: Add AST_SIGNATURE record to unhashed control block of PCM files

Daniel Grumberg via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Jun 10 08:12:22 PDT 2020


dang added inline comments.


================
Comment at: clang/include/clang/Serialization/ASTBitCodes.h:396-400
       /// Record code for the signature that identifiers this AST file.
       SIGNATURE = 1,
 
+      /// Record code for the signature of the AST block.
+      AST_SIGNATURE,
----------------
dexonsmith wrote:
> dang wrote:
> > dang wrote:
> > > dexonsmith wrote:
> > > > These names and descriptions hard hard to differentiate. Is there another way of naming these that will be more clear?
> > > > 
> > > > (One idea I had is to create `CONTROL_BLOCK_HASH` and `AST_BLOCK_HASH` and then `SIGNATURE` could just be their hash-combine, but maybe you have another idea.)
> > > I kept the same hasher when computing both of these which mitigates the cost. I don't see the need for also emitting a hash for the control block, there are some optional records that are not in both the AST block and the control block anyway.
> > I also think that the `AST_BLOCK_HASH` and the `SIGNATURE` are enough information already. In most cases you can deduce if the control block was different by just checking if the signatures were different and the ASTs the same.
> Thanks for updating to `AST_BLOCK_HASH`, I think that addressed the concern I had.
> 
> However, having thought further about it, I don't think it makes sense to create a new record type. I suggest just having a convention of writing the hashes out in a particular order.
> - First record is the overall signature.
> - Second record is the AST block hash.
> - In the future, we can add a third record for the control block hash.
> 
> In which case, the (single) record name should be something generic like `SIGNATURE` or `SHA1_HASH`.
> 
> Note: I doubt there would be additional cost to computing the AST and control block hashes separately, but I agree there isn't a compelling use for the control block hash currently (although it would be nice to confirm properties like if the `CONTROL_BLOCK_HASH` were typically stable when sources changed).
> 
> Note: I don't think the content of the "hashed" control block vs. unhashed control block have been carefully considered from first principles. At some point we might do that. I expect there are currently things that change the control block that would not invalidate it for importers.
I agree with the fact that it might be worth maintaining a control block hash, but I do think this is not the purpose of this change. I can do it as a follow up when I have some time. I am somewhat opposed to keeping them all in the same record for now, because it makes it harder to grep for one of the signatures in the output of `llvm-bcanalyzer` which is currently the only available tooling for looking inside the unhashed control block.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80383/new/

https://reviews.llvm.org/D80383





More information about the cfe-commits mailing list