[PATCH] D80383: Add AST_SIGNATURE record to unhashed control block of PCM files
Duncan P. N. Exon Smith via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Mon Jun 1 15:11:43 PDT 2020
dexonsmith requested changes to this revision.
dexonsmith added a comment.
This revision now requires changes to proceed.
Thanks for working on this! I have a number of suggestions inline.
================
Comment at: clang/include/clang/Serialization/ASTBitCodes.h:44
/// AST files at this time.
const unsigned VERSION_MAJOR = 10;
----------------
`VERSION_MAJOR` should be bumped.
================
Comment at: clang/include/clang/Serialization/ASTBitCodes.h:245
/// Offset in the AST file. Keep structure alignment 32-bit and avoid
/// padding gap because undefined value in the padding affects AST hash.
----------------
Please update the first sentence to be accurate.
================
Comment at: clang/include/clang/Serialization/ASTBitCodes.h:247
/// padding gap because undefined value in the padding affects AST hash.
+ /// This value is relative to the offset of the AST block
UnderalignedInt64 BitOffset;
----------------
I don't think you'll need this comment anymore, after you fix the above. But if you keep it, please add a period at the end of the sentence.
================
Comment at: clang/include/clang/Serialization/ASTBitCodes.h:396-400
/// Record code for the signature that identifiers this AST file.
SIGNATURE = 1,
+ /// Record code for the signature of the AST block.
+ AST_SIGNATURE,
----------------
These names and descriptions hard hard to differentiate. Is there another way of naming these that will be more clear?
(One idea I had is to create `CONTROL_BLOCK_HASH` and `AST_BLOCK_HASH` and then `SIGNATURE` could just be their hash-combine, but maybe you have another idea.)
================
Comment at: clang/lib/Serialization/ASTReader.cpp:2931
BitstreamCursor &Stream = F.Stream;
-
if (llvm::Error Err = Stream.EnterSubBlock(AST_BLOCK_ID)) {
----------------
Please submit this NFC change separately.
================
Comment at: clang/lib/Serialization/ASTWriter.cpp:1042-1046
+ ASTFileSignature ASTSignature;
+ for (int I = 0; I != 5; ++I)
+ ASTSignature[I] = LShift(Hash[I * 4 + 0], 24) |
+ LShift(Hash[I * 4 + 1], 16) | LShift(Hash[I * 4 + 2], 8) |
+ LShift(Hash[I * 4 + 3], 0);
----------------
Rather than duplicating this code, would it be simpler to start with a patch that changes `ASTFileSignature` to not require this massaging? I suggest either adding a constructor that takes `Hash` directly, or just changing it to have the same type as `Hash` (and updating the serialization to expect that).
================
Comment at: clang/lib/Serialization/ASTWriter.cpp:1061
- return Signature;
+ return std::make_pair(ASTSignature, Signature);
}
----------------
Alternately, you could change `ASTFileSignature` to have three fields:
- ControlBlockHash
- ASTBlockHash
- Signature
(You could construct the signature by taking the SHA1 of the two block hashes.)
================
Comment at: clang/lib/Serialization/ASTWriter.cpp:1084
PP.getHeaderSearchInfo().getHeaderSearchOpts().ModulesHashContent) {
- Signature = createSignature(StringRef(Buffer.begin(), StartOfUnhashedControl));
+ Record.append(ASTSignature.begin(), ASTSignature.end());
+ Stream.EmitRecord(AST_SIGNATURE, Record);
----------------
The default abbreviation (7-VBR) isn't great for hashes. Given that we're going to have two of these records, I suggest specifying a custom abbreviation.
If you were to split out a prep commit as suggested above to change `SIGNATURE` to be an array of 8-bit values (instead of the current array of 32-bit values), that would be a good opportunity to add an abbreviation.
================
Comment at: clang/lib/Serialization/ASTWriter.cpp:2078
SourceMgr.getNextLocalOffset() - 1 /* skip dummy */,
- SLocEntryOffsetsBase};
+ SLocEntryOffsetsBase - ASTBlockRange.first};
Stream.EmitRecordWithBlob(SLocOffsetsAbbrev, Record,
----------------
Can this be relative to the start of `SOURCE_MANAGER_BLOCK_ID` instead?
================
Comment at: clang/lib/Serialization/ASTWriter.cpp:4729-4733
+ StringRef Name = M.Kind == MK_PrebuiltModule ||
+ M.Kind == MK_ExplicitModule ||
+ M.Kind == MK_ImplicitModule
+ ? M.ModuleName
+ : M.FileName;
----------------
Is there ever a reason to use `M.FileName` here, or is that always redundant with what's in the control block? I wonder if we can just remove this complexity.
================
Comment at: clang/lib/Serialization/ASTWriterDecl.cpp:2437
if (DeclOffsets.size() == Index)
- DeclOffsets.emplace_back(Loc, Offset);
+ DeclOffsets.emplace_back(Loc, Offset, ASTBlockRange.first);
else if (DeclOffsets.size() < Index) {
----------------
I suggest making `WriteDecls` relative to the start of the `DECLTYPES_BLOCK_ID`. That will make the block itself (more) self-contained, and also make the offsets smaller (and therefore cheaper to store in bitcode).
I also don't think you need to actually emit that integer, it's just an implicit contract between the reader and writer.
================
Comment at: clang/lib/Serialization/ASTWriterDecl.cpp:2442
DeclOffsets[Index].setLocation(Loc);
- DeclOffsets[Index].setBitOffset(Offset);
+ DeclOffsets[Index].setBitOffset(Offset, ASTBlockRange.first);
} else {
----------------
Same here, this should be relative to the start of `DECLTYPES_BLOCK_ID`.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D80383/new/
https://reviews.llvm.org/D80383
More information about the cfe-commits
mailing list