r220493 - Add a "signature" to AST files to verify that they haven't changed

Richard Smith richard at metafoo.co.uk
Thu Oct 23 15:34:35 PDT 2014


On Thu, Oct 23, 2014 at 11:41 AM, Ben Langmuir <blangmuir at apple.com> wrote:

>
> On Oct 23, 2014, at 11:24 AM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Thu, Oct 23, 2014 at 11:05 AM, Ben Langmuir <blangmuir at apple.com>
> wrote:
>
>> Author: benlangmuir
>> Date: Thu Oct 23 13:05:36 2014
>> New Revision: 220493
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=220493&view=rev
>> Log:
>> Add a "signature" to AST files to verify that they haven't changed
>>
>> Since the order of the IDs in the AST file (e.g. DeclIDs, SelectorIDs)
>> is not stable,
>
>
> Um... when you say "not stable" what do you mean? (& when you say "nothing
> changed" what do you mean)
>
>
> Identifiers, decls, macros, etc. are referred to by ID numbers, which
> depend on the order the entity is looked at during serialization.  This in
> turn often depends on hash-table iteration order and is not stable across
> runs.  When a module refers to a name in one of its imports, it will find
> the wrong name if the order has changed since it was first built.
>
> By “not changed” I mean that the sources have not changed.
>
>
> It's pretty important that the compiler produces identical bytes given
> identical input. We're pretty clear about fixing any cases where it doesn't
> (but maybe we haven't been as rigorous about that with AST serialization? I
> don't know - I'b be a bit scared if we haven’t)
>
>
> Maybe someone more familiar with the history of the ASTWriter can comment?
> +rsmith, +dgregor, +akyrtzidis
>

Non-deterministic output is a bug, but it's been lower priority than other
stuff I've been looking at up until now. It's likely to get near the top of
my priority stack fairly soon. So... this patch is /technically/ making
that existing problem worse by adding more nondeterminism to the output,
but since it's also fixing an immediate problem, I suggest we keep this
change in place until we've fixed the other nondeterminism, then revert it.

Does that seem reasonable?

I briefly looked at making the IDs deterministic, but I saw several hash
> tables being iterated over with no apparently guarantee of order, and
> figured we were not trying to produce identical bytes.  We certainly
> *aren’t* producing identical bytes, and when this happens it causes utter
> insanity.
>
> Ben
>
>
>
>> it is not safe to load an AST file that depends on
>> another AST file that has been rebuilt since the importer was built,
>> even if "nothing changed". We previously used size and modtime to check
>> this, but I've seen cases where a module rebuilt quickly enough to foil
>> this check and caused very hard to debug build errors.
>>
>> To save cycles when we're loading the AST, we just generate a random
>> nonce value and check that it hasn't changed when we load an imported
>> module, rather than actually hash the whole file.
>>
>> This is slightly complicated by the fact that we need to verify the
>> signature inside addModule, since we might otherwise consider that a
>> mdoule is "OutOfDate" when really it is the importer that is out of
>> date. I didn't see any regressions in module load time after this
>> change.
>>
>> Added:
>>     cfe/trunk/test/Modules/rebuild.m
>> Modified:
>>     cfe/trunk/include/clang/Serialization/ASTBitCodes.h
>>     cfe/trunk/include/clang/Serialization/ASTReader.h
>>     cfe/trunk/include/clang/Serialization/Module.h
>>     cfe/trunk/include/clang/Serialization/ModuleManager.h
>>     cfe/trunk/lib/Serialization/ASTReader.cpp
>>     cfe/trunk/lib/Serialization/ASTWriter.cpp
>>     cfe/trunk/lib/Serialization/GlobalModuleIndex.cpp
>>     cfe/trunk/lib/Serialization/Module.cpp
>>     cfe/trunk/lib/Serialization/ModuleManager.cpp
>>
>> Modified: cfe/trunk/include/clang/Serialization/ASTBitCodes.h
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Serialization/ASTBitCodes.h?rev=220493&r1=220492&r2=220493&view=diff
>>
>> ==============================================================================
>> --- cfe/trunk/include/clang/Serialization/ASTBitCodes.h (original)
>> +++ cfe/trunk/include/clang/Serialization/ASTBitCodes.h Thu Oct 23
>> 13:05:36 2014
>> @@ -288,7 +288,10 @@ namespace clang {
>>
>>        /// \brief Record code for the module map file that was used to
>> build this
>>        /// AST file.
>> -      MODULE_MAP_FILE = 14
>> +      MODULE_MAP_FILE = 14,
>> +
>> +      /// \brief Record code for the signature that identifiers this AST
>> file.
>> +      SIGNATURE = 15
>>      };
>>
>>      /// \brief Record types that occur within the input-files block
>>
>> Modified: cfe/trunk/include/clang/Serialization/ASTReader.h
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Serialization/ASTReader.h?rev=220493&r1=220492&r2=220493&view=diff
>>
>> ==============================================================================
>> --- cfe/trunk/include/clang/Serialization/ASTReader.h (original)
>> +++ cfe/trunk/include/clang/Serialization/ASTReader.h Thu Oct 23 13:05:36
>> 2014
>> @@ -1125,6 +1125,7 @@ private:
>>                              SourceLocation ImportLoc, ModuleFile
>> *ImportedBy,
>>                              SmallVectorImpl<ImportedModule> &Loaded,
>>                              off_t ExpectedSize, time_t ExpectedModTime,
>> +                            serialization::ASTFileSignature
>> ExpectedSignature,
>>                              unsigned ClientLoadCapabilities);
>>    ASTReadResult ReadControlBlock(ModuleFile &F,
>>                                   SmallVectorImpl<ImportedModule> &Loaded,
>>
>> Modified: cfe/trunk/include/clang/Serialization/Module.h
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Serialization/Module.h?rev=220493&r1=220492&r2=220493&view=diff
>>
>> ==============================================================================
>> --- cfe/trunk/include/clang/Serialization/Module.h (original)
>> +++ cfe/trunk/include/clang/Serialization/Module.h Thu Oct 23 13:05:36
>> 2014
>> @@ -97,6 +97,8 @@ public:
>>    bool isNotFound() const { return Val.getInt() == NotFound; }
>>  };
>>
>> +typedef unsigned ASTFileSignature;
>> +
>>  /// \brief Information about a module that has been loaded by the
>> ASTReader.
>>  ///
>>  /// Each instance of the Module class corresponds to a single AST file,
>> which
>> @@ -152,6 +154,10 @@ public:
>>    /// \brief The file entry for the module file.
>>    const FileEntry *File;
>>
>> +  /// \brief The signature of the module file, which may be used along
>> with size
>> +  /// and modification time to identify this particular file.
>> +  ASTFileSignature Signature;
>> +
>>    /// \brief Whether this module has been directly imported by the
>>    /// user.
>>    bool DirectlyImported;
>>
>> Modified: cfe/trunk/include/clang/Serialization/ModuleManager.h
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Serialization/ModuleManager.h?rev=220493&r1=220492&r2=220493&view=diff
>>
>> ==============================================================================
>> --- cfe/trunk/include/clang/Serialization/ModuleManager.h (original)
>> +++ cfe/trunk/include/clang/Serialization/ModuleManager.h Thu Oct 23
>> 13:05:36 2014
>> @@ -179,6 +179,12 @@ public:
>>    /// \param ExpectedModTime The expected modification time of the module
>>    /// file, used for validation. This will be zero if unknown.
>>    ///
>> +  /// \param ExpectedSignature The expected signature of the module
>> file, used
>> +  /// for validation. This will be zero if unknown.
>> +  ///
>> +  /// \param ReadSignature Reads the signature from an AST file without
>> actually
>> +  /// loading it.
>> +  ///
>>    /// \param Module A pointer to the module file if the module was
>> successfully
>>    /// loaded.
>>    ///
>> @@ -191,6 +197,9 @@ public:
>>                              SourceLocation ImportLoc,
>>                              ModuleFile *ImportedBy, unsigned Generation,
>>                              off_t ExpectedSize, time_t ExpectedModTime,
>> +                            ASTFileSignature ExpectedSignature,
>> +
>> std::function<ASTFileSignature(llvm::BitstreamReader &)>
>> +                                ReadSignature,
>>                              ModuleFile *&Module,
>>                              std::string &ErrorStr);
>>
>>
>> Modified: cfe/trunk/lib/Serialization/ASTReader.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Serialization/ASTReader.cpp?rev=220493&r1=220492&r2=220493&view=diff
>>
>> ==============================================================================
>> --- cfe/trunk/lib/Serialization/ASTReader.cpp (original)
>> +++ cfe/trunk/lib/Serialization/ASTReader.cpp Thu Oct 23 13:05:36 2014
>> @@ -2363,6 +2363,11 @@ ASTReader::ReadControlBlock(ModuleFile &
>>        break;
>>      }
>>
>> +    case SIGNATURE:
>> +      assert((!F.Signature || F.Signature == Record[0]) && "signature
>> changed");
>> +      F.Signature = Record[0];
>> +      break;
>> +
>>      case IMPORTS: {
>>        // Load each of the imported PCH files.
>>        unsigned Idx = 0, N = Record.size();
>> @@ -2376,6 +2381,7 @@ ASTReader::ReadControlBlock(ModuleFile &
>>              SourceLocation::getFromRawEncoding(Record[Idx++]);
>>          off_t StoredSize = (off_t)Record[Idx++];
>>          time_t StoredModTime = (time_t)Record[Idx++];
>> +        ASTFileSignature StoredSignature = Record[Idx++];
>>          unsigned Length = Record[Idx++];
>>          SmallString<128> ImportedFile(Record.begin() + Idx,
>>                                        Record.begin() + Idx + Length);
>> @@ -2383,7 +2389,7 @@ ASTReader::ReadControlBlock(ModuleFile &
>>
>>          // Load the AST file.
>>          switch(ReadASTCore(ImportedFile, ImportedKind, ImportLoc, &F,
>> Loaded,
>> -                           StoredSize, StoredModTime,
>> +                           StoredSize, StoredModTime, StoredSignature,
>>                             ClientLoadCapabilities)) {
>>          case Failure: return Failure;
>>            // If we have to ignore the dependency, we'll have to ignore
>> this too.
>> @@ -3552,7 +3558,7 @@ ASTReader::ASTReadResult ASTReader::Read
>>    SmallVector<ImportedModule, 4> Loaded;
>>    switch(ASTReadResult ReadResult = ReadASTCore(FileName, Type,
>> ImportLoc,
>>                                                  /*ImportedBy=*/nullptr,
>> Loaded,
>> -                                                0, 0,
>> +                                                0, 0, 0,
>>                                                  ClientLoadCapabilities))
>> {
>>    case Failure:
>>    case Missing:
>> @@ -3719,6 +3725,8 @@ ASTReader::ASTReadResult ASTReader::Read
>>    return Success;
>>  }
>>
>> +static ASTFileSignature readASTFileSignature(llvm::BitstreamReader
>> &StreamFile);
>> +
>>  ASTReader::ASTReadResult
>>  ASTReader::ReadASTCore(StringRef FileName,
>>                         ModuleKind Type,
>> @@ -3726,12 +3734,14 @@ ASTReader::ReadASTCore(StringRef FileNam
>>                         ModuleFile *ImportedBy,
>>                         SmallVectorImpl<ImportedModule> &Loaded,
>>                         off_t ExpectedSize, time_t ExpectedModTime,
>> +                       ASTFileSignature ExpectedSignature,
>>                         unsigned ClientLoadCapabilities) {
>>    ModuleFile *M;
>>    std::string ErrorStr;
>>    ModuleManager::AddModuleResult AddResult
>>      = ModuleMgr.addModule(FileName, Type, ImportLoc, ImportedBy,
>>                            getGeneration(), ExpectedSize, ExpectedModTime,
>> +                          ExpectedSignature, readASTFileSignature,
>>                            M, ErrorStr);
>>
>>    switch (AddResult) {
>> @@ -4029,6 +4039,34 @@ static bool SkipCursorToBlock(BitstreamC
>>    }
>>  }
>>
>> +static ASTFileSignature readASTFileSignature(llvm::BitstreamReader
>> &StreamFile){
>> +  BitstreamCursor Stream(StreamFile);
>> +  if (Stream.Read(8) != 'C' ||
>> +      Stream.Read(8) != 'P' ||
>> +      Stream.Read(8) != 'C' ||
>> +      Stream.Read(8) != 'H') {
>> +    return 0;
>> +  }
>> +
>> +  // Scan for the CONTROL_BLOCK_ID block.
>> +  if (SkipCursorToBlock(Stream, CONTROL_BLOCK_ID))
>> +    return 0;
>> +
>> +  // Scan for SIGNATURE inside the control block.
>> +  ASTReader::RecordData Record;
>> +  while (1) {
>> +    llvm::BitstreamEntry Entry = Stream.advanceSkippingSubblocks();
>> +    if (Entry.Kind == llvm::BitstreamEntry::EndBlock ||
>> +        Entry.Kind != llvm::BitstreamEntry::Record)
>> +      return 0;
>> +
>> +    Record.clear();
>> +    StringRef Blob;
>> +    if (SIGNATURE == Stream.readRecord(Entry.ID, Record, &Blob))
>> +      return Record[0];
>> +  }
>> +}
>> +
>>  /// \brief Retrieve the name of the original source file name
>>  /// directly from the AST file, without actually loading the AST
>>  /// file.
>>
>> Modified: cfe/trunk/lib/Serialization/ASTWriter.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Serialization/ASTWriter.cpp?rev=220493&r1=220492&r2=220493&view=diff
>>
>> ==============================================================================
>> --- cfe/trunk/lib/Serialization/ASTWriter.cpp (original)
>> +++ cfe/trunk/lib/Serialization/ASTWriter.cpp Thu Oct 23 13:05:36 2014
>> @@ -51,6 +51,7 @@
>>  #include "llvm/Support/MemoryBuffer.h"
>>  #include "llvm/Support/OnDiskHashTable.h"
>>  #include "llvm/Support/Path.h"
>> +#include "llvm/Support/Process.h"
>>  #include <algorithm>
>>  #include <cstdio>
>>  #include <string.h>
>> @@ -862,6 +863,7 @@ void ASTWriter::WriteBlockInfoBlock() {
>>    // Control Block.
>>    BLOCK(CONTROL_BLOCK);
>>    RECORD(METADATA);
>> +  RECORD(SIGNATURE);
>>    RECORD(MODULE_NAME);
>>    RECORD(MODULE_MAP_FILE);
>>    RECORD(IMPORTS);
>> @@ -1092,6 +1094,14 @@ adjustFilenameForRelocatablePCH(const ch
>>    return Filename + Pos;
>>  }
>>
>> +static ASTFileSignature getSignature() {
>> +  while (1) {
>> +    if (ASTFileSignature S = llvm::sys::Process::GetRandomNumber())
>> +      return S;
>> +    // Rely on GetRandomNumber to eventually return non-zero...
>> +  }
>> +}
>> +
>>  /// \brief Write the control block.
>>  void ASTWriter::WriteControlBlock(Preprocessor &PP, ASTContext &Context,
>>                                    StringRef isysroot,
>> @@ -1121,6 +1131,11 @@ void ASTWriter::WriteControlBlock(Prepro
>>    Stream.EmitRecordWithBlob(MetadataAbbrevCode, Record,
>>                              getClangFullRepositoryVersion());
>>
>> +  // Signature
>> +  Record.clear();
>> +  Record.push_back(getSignature());
>> +  Stream.EmitRecord(SIGNATURE, Record);
>> +
>>    // Module name
>>    if (WritingModule) {
>>      BitCodeAbbrev *Abbrev = new BitCodeAbbrev();
>> @@ -1173,6 +1188,7 @@ void ASTWriter::WriteControlBlock(Prepro
>>        AddSourceLocation((*M)->ImportLoc, Record);
>>        Record.push_back((*M)->File->getSize());
>>        Record.push_back((*M)->File->getModificationTime());
>> +      Record.push_back((*M)->Signature);
>>        const std::string &FileName = (*M)->FileName;
>>        Record.push_back(FileName.size());
>>        Record.append(FileName.begin(), FileName.end());
>>
>> Modified: cfe/trunk/lib/Serialization/GlobalModuleIndex.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Serialization/GlobalModuleIndex.cpp?rev=220493&r1=220492&r2=220493&view=diff
>>
>> ==============================================================================
>> --- cfe/trunk/lib/Serialization/GlobalModuleIndex.cpp (original)
>> +++ cfe/trunk/lib/Serialization/GlobalModuleIndex.cpp Thu Oct 23 13:05:36
>> 2014
>> @@ -591,6 +591,10 @@ bool GlobalModuleIndexBuilder::loadModul
>>          off_t StoredSize = (off_t)Record[Idx++];
>>          time_t StoredModTime = (time_t)Record[Idx++];
>>
>> +        // Skip the stored signature.
>> +        // FIXME: we could read the signature out of the import and
>> validate it.
>> +        Idx++;
>> +
>>          // Retrieve the imported file name.
>>          unsigned Length = Record[Idx++];
>>          SmallString<128> ImportedFile(Record.begin() + Idx,
>>
>> Modified: cfe/trunk/lib/Serialization/Module.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Serialization/Module.cpp?rev=220493&r1=220492&r2=220493&view=diff
>>
>> ==============================================================================
>> --- cfe/trunk/lib/Serialization/Module.cpp (original)
>> +++ cfe/trunk/lib/Serialization/Module.cpp Thu Oct 23 13:05:36 2014
>> @@ -21,7 +21,7 @@ using namespace serialization;
>>  using namespace reader;
>>
>>  ModuleFile::ModuleFile(ModuleKind Kind, unsigned Generation)
>> -  : Kind(Kind), File(nullptr), DirectlyImported(false),
>> +  : Kind(Kind), File(nullptr), Signature(0), DirectlyImported(false),
>>      Generation(Generation), SizeInBits(0),
>>      LocalNumSLocEntries(0), SLocEntryBaseID(0),
>>      SLocEntryBaseOffset(0), SLocEntryOffsets(nullptr),
>>
>> Modified: cfe/trunk/lib/Serialization/ModuleManager.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Serialization/ModuleManager.cpp?rev=220493&r1=220492&r2=220493&view=diff
>>
>> ==============================================================================
>> --- cfe/trunk/lib/Serialization/ModuleManager.cpp (original)
>> +++ cfe/trunk/lib/Serialization/ModuleManager.cpp Thu Oct 23 13:05:36 2014
>> @@ -57,6 +57,9 @@ ModuleManager::addModule(StringRef FileN
>>                           SourceLocation ImportLoc, ModuleFile
>> *ImportedBy,
>>                           unsigned Generation,
>>                           off_t ExpectedSize, time_t ExpectedModTime,
>> +                         ASTFileSignature ExpectedSignature,
>> +
>>  std::function<ASTFileSignature(llvm::BitstreamReader &)>
>> +                             ReadSignature,
>>                           ModuleFile *&Module,
>>                           std::string &ErrorStr) {
>>    Module = nullptr;
>> @@ -130,6 +133,21 @@ ModuleManager::addModule(StringRef FileN
>>      // Initialize the stream
>>      New->StreamFile.init((const unsigned char
>> *)New->Buffer->getBufferStart(),
>>                           (const unsigned char
>> *)New->Buffer->getBufferEnd());
>> +
>> +    if (ExpectedSignature) {
>> +      New->Signature = ReadSignature(New->StreamFile);
>> +      if (New->Signature != ExpectedSignature) {
>> +        ErrorStr = New->Signature ? "signature mismatch"
>> +                                  : "could not read module signature";
>> +
>> +        // Remove the module file immediately, since removeModules might
>> try to
>> +        // invalidate the file cache for Entry, and that is not safe if
>> this
>> +        // module is *itself* up to date, but has an out-of-date
>> importer.
>> +        Modules.erase(Entry);
>> +        Chain.pop_back();
>> +        return OutOfDate;
>> +      }
>> +    }
>>    }
>>
>>    if (ImportedBy) {
>>
>> Added: cfe/trunk/test/Modules/rebuild.m
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/test/Modules/rebuild.m?rev=220493&view=auto
>>
>> ==============================================================================
>> --- cfe/trunk/test/Modules/rebuild.m (added)
>> +++ cfe/trunk/test/Modules/rebuild.m Thu Oct 23 13:05:36 2014
>> @@ -0,0 +1,29 @@
>> +// REQUIRES: shell
>> +// RUN: rm -rf %t
>> +
>> +// Build Module and set its timestamp
>> +// RUN: echo '@import Module;' | %clang_cc1 -fmodules
>> -fmodules-cache-path=%t -fdisable-module-hash -fsyntax-only -F %S/Inputs -x
>> objective-c -
>> +// RUN: touch -m -a -t 201101010000 %t/Module.pcm
>> +// RUN: cp %t/Module.pcm %t/Module.pcm.saved
>> +// RUN: wc -c %t/Module.pcm > %t/Module.size.saved
>> +
>> +// Build DependsOnModule
>> +// RUN: %clang_cc1 -fmodules -fmodules-cache-path=%t
>> -fdisable-module-hash -fsyntax-only -F %S/Inputs %s
>> +// RUN: diff %t/Module.pcm %t/Module.pcm.saved
>> +// RUN: cp %t/DependsOnModule.pcm %t/DependsOnModule.pcm.saved
>> +
>> +// Rebuild Module, reset its timestamp, and verify its size hasn't
>> changed
>> +// RUN: rm %t/Module.pcm
>> +// RUN: echo '@import Module;' | %clang_cc1 -fmodules
>> -fmodules-cache-path=%t -fdisable-module-hash -fsyntax-only -F %S/Inputs -x
>> objective-c -
>> +// RUN: touch -m -a -t 201101010000 %t/Module.pcm
>> +// RUN: wc -c %t/Module.pcm > %t/Module.size
>> +// RUN: diff %t/Module.size %t/Module.size.saved
>> +// RUN: cp %t/Module.pcm %t/Module.pcm.saved.2
>> +
>> +// But the signature at least is expected to change, so we rebuild
>> DependsOnModule.
>> +// NOTE: if we change how the signature is created, this test may need
>> updating.
>> +// RUN: %clang_cc1 -fmodules -fmodules-cache-path=%t
>> -fdisable-module-hash -fsyntax-only -F %S/Inputs %s
>> +// RUN: diff %t/Module.pcm %t/Module.pcm.saved.2
>> +// RUN: not diff %t/DependsOnModule.pcm %t/DependsOnModule.pcm.saved
>> +
>> + at import DependsOnModule;
>>
>>
>> _______________________________________________
>> cfe-commits mailing list
>> cfe-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20141023/3b3504cc/attachment.html>


More information about the cfe-commits mailing list