[PATCH] D80383: Add AST_SIGNATURE record to unhashed control block of PCM files

Duncan P. N. Exon Smith via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Jun 1 15:11:43 PDT 2020


dexonsmith requested changes to this revision.
dexonsmith added a comment.
This revision now requires changes to proceed.

Thanks for working on this! I have a number of suggestions inline.



================
Comment at: clang/include/clang/Serialization/ASTBitCodes.h:44
     /// AST files at this time.
     const unsigned VERSION_MAJOR = 10;
 
----------------
`VERSION_MAJOR` should be bumped.


================
Comment at: clang/include/clang/Serialization/ASTBitCodes.h:245
 
       /// Offset in the AST file. Keep structure alignment 32-bit and avoid
       /// padding gap because undefined value in the padding affects AST hash.
----------------
Please update the first sentence to be accurate.


================
Comment at: clang/include/clang/Serialization/ASTBitCodes.h:247
       /// padding gap because undefined value in the padding affects AST hash.
+      /// This value is relative to the offset of the AST block
       UnderalignedInt64 BitOffset;
----------------
I don't think you'll need this comment anymore, after you fix the above. But if you keep it, please add a period at the end of the sentence.


================
Comment at: clang/include/clang/Serialization/ASTBitCodes.h:396-400
       /// Record code for the signature that identifiers this AST file.
       SIGNATURE = 1,
 
+      /// Record code for the signature of the AST block.
+      AST_SIGNATURE,
----------------
These names and descriptions hard hard to differentiate. Is there another way of naming these that will be more clear?

(One idea I had is to create `CONTROL_BLOCK_HASH` and `AST_BLOCK_HASH` and then `SIGNATURE` could just be their hash-combine, but maybe you have another idea.)


================
Comment at: clang/lib/Serialization/ASTReader.cpp:2931
   BitstreamCursor &Stream = F.Stream;
-
   if (llvm::Error Err = Stream.EnterSubBlock(AST_BLOCK_ID)) {
----------------
Please submit this NFC change separately.


================
Comment at: clang/lib/Serialization/ASTWriter.cpp:1042-1046
+  ASTFileSignature ASTSignature;
+  for (int I = 0; I != 5; ++I)
+    ASTSignature[I] = LShift(Hash[I * 4 + 0], 24) |
+                      LShift(Hash[I * 4 + 1], 16) | LShift(Hash[I * 4 + 2], 8) |
+                      LShift(Hash[I * 4 + 3], 0);
----------------
Rather than duplicating this code, would it be simpler to start with a patch that changes `ASTFileSignature` to not require this massaging? I suggest either adding a constructor that takes `Hash` directly, or just changing it to have the same type as `Hash` (and updating the serialization to expect that).


================
Comment at: clang/lib/Serialization/ASTWriter.cpp:1061
 
-  return Signature;
+  return std::make_pair(ASTSignature, Signature);
 }
----------------
Alternately, you could change `ASTFileSignature` to have three fields:
- ControlBlockHash
- ASTBlockHash
- Signature

(You could construct the signature by taking the SHA1 of the two block hashes.)


================
Comment at: clang/lib/Serialization/ASTWriter.cpp:1084
       PP.getHeaderSearchInfo().getHeaderSearchOpts().ModulesHashContent) {
-    Signature = createSignature(StringRef(Buffer.begin(), StartOfUnhashedControl));
+    Record.append(ASTSignature.begin(), ASTSignature.end());
+    Stream.EmitRecord(AST_SIGNATURE, Record);
----------------
The default abbreviation (7-VBR) isn't great for hashes. Given that we're going to have two of these records, I suggest specifying a custom abbreviation.

If you were to split out a prep commit as suggested above to change `SIGNATURE` to be an array of 8-bit values (instead of the current array of 32-bit values), that would be a good opportunity to add an abbreviation.


================
Comment at: clang/lib/Serialization/ASTWriter.cpp:2078
         SourceMgr.getNextLocalOffset() - 1 /* skip dummy */,
-        SLocEntryOffsetsBase};
+        SLocEntryOffsetsBase - ASTBlockRange.first};
     Stream.EmitRecordWithBlob(SLocOffsetsAbbrev, Record,
----------------
Can this be relative to the start of `SOURCE_MANAGER_BLOCK_ID` instead?


================
Comment at: clang/lib/Serialization/ASTWriter.cpp:4729-4733
+        StringRef Name = M.Kind == MK_PrebuiltModule ||
+                                 M.Kind == MK_ExplicitModule ||
+                                 M.Kind == MK_ImplicitModule
+                             ? M.ModuleName
+                             : M.FileName;
----------------
Is there ever a reason to use `M.FileName` here, or is that always redundant with what's in the control block? I wonder if we can just remove this complexity.



================
Comment at: clang/lib/Serialization/ASTWriterDecl.cpp:2437
   if (DeclOffsets.size() == Index)
-    DeclOffsets.emplace_back(Loc, Offset);
+    DeclOffsets.emplace_back(Loc, Offset, ASTBlockRange.first);
   else if (DeclOffsets.size() < Index) {
----------------
I suggest making `WriteDecls` relative to the start of the `DECLTYPES_BLOCK_ID`. That will make the block itself (more) self-contained, and also make the offsets smaller (and therefore cheaper to store in bitcode).

I also don't think you need to actually emit that integer, it's just an implicit contract between the reader and writer.


================
Comment at: clang/lib/Serialization/ASTWriterDecl.cpp:2442
     DeclOffsets[Index].setLocation(Loc);
-    DeclOffsets[Index].setBitOffset(Offset);
+    DeclOffsets[Index].setBitOffset(Offset, ASTBlockRange.first);
   } else {
----------------
Same here, this should be relative to the start of `DECLTYPES_BLOCK_ID`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80383/new/

https://reviews.llvm.org/D80383





More information about the cfe-commits mailing list