[PATCH] D18213: Add a module Hash in the bitcode and the combined index.

Tue Mar 22 08:28:59 PDT 2016

We can make it conditional (with a clang flag?), but the desired behavior is to be able to decide on enabling/disabling the caching at link time, so any bitcode static archive distributed would need to have a hash embedded. The user friendly choice would be to enabled it by *default* when -flto=thing, and an explicit option to disable it.

-- 
Mehdi

> On Mar 22, 2016, at 8:05 AM, Teresa Johnson <tejohnson at google.com> wrote:
> 
> tejohnson added a comment.
> 
> Can we make generation of the hashes conditional? They aren't needed if a module cache is not being used for incremental rebuilds. I anticipate we will use a different mechanism with our distributed build system, for example. It is simple to suppress generation of the new MODULE_CODE_HASH record, but would be good to also suppress the larger module string table as well. Maybe have a separate set of record types for when we want to include the hashes?
> 
> 
> ================
> Comment at: include/llvm/IR/ModuleSummaryIndex.h:234
> @@ +233,3 @@
> +/// 160 bits SHA1
> +typedef std::array<uint8_t, 20> ModuleHash;
> +
> ----------------
> Why not use uint32_t? That seems to be what is used in SHA1.h. 
> 
> ================
> Comment at: include/llvm/IR/ModuleSummaryIndex.h:312
> @@ -306,2 +311,3 @@
> 
>   /// Iterator to allow writer to walk through table during emission.
> +  const StringMap<std::pair<uint64_t, ModuleHash>> &modulePaths() const {
> ----------------
> No longer returning an iterator. Ditto for non-const method below.
> 
> ================
> Comment at: include/llvm/IR/ModuleSummaryIndex.h:360
> @@ +359,3 @@
> +
> +  /// Add a new module path, mapped to the given module Id, and return StringRef
> +  /// owned by string table map.
> ----------------
> Comment needs update: "with the given Hash". Also, is it possible to combine this with the above using a default argument value of {{0}}?
> 
> ================
> Comment at: lib/Bitcode/Reader/BitcodeReader.cpp:5672
> @@ -5675,1 +5671,3 @@
>         }
> +        /// MODULE_CODE_HASH: [6*i32]
> +        case bitc::MODULE_CODE_HASH: {
> ----------------
> 5*i32? Assert below checking for 5 values. And 32*5==160, so that seems like the right number. And update the description of this record accordingly in LLVMBitCodes.h where it just says [unsigned].
> 
> ================
> Comment at: lib/Bitcode/Reader/BitcodeReader.cpp:5949
> @@ -5931,5 +5948,3 @@
>     case bitc::MST_CODE_ENTRY: {
> -      // MST_ENTRY: [modid, namechar x N]
> -      if (convertToString(Record, 1, ModulePath))
> -        return error("Invalid record");
> +      // MST_ENTRY: [modid, hash (6xi32), namechar x N]
>       uint64_t ModuleId = Record[0];
> ----------------
> Why is this one 6*i32 whereas the MODULE_CODE_HASH is only expected to have 5 i32 (according to implementation, comment above incorrectly says it is 6*i32)?
> 
> ================
> Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:2837
> @@ -2829,1 +2836,3 @@
>   Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8));
> +  Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8));
> +  // 160 bits SHA1
> ----------------
> What is the added VBR8 here and in the 6-bit case below? I don't see anything else being written. And it isn't in the 8-bit case above.
> 
> ================
> Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:3199
> @@ +3198,3 @@
> +  // Emit the module's hash.
> +  // MODULE_CODE_HASH: [unsigned]
> +  auto Hash = Stream.getCurrentHash();
> ----------------
> 5*i32?
> 
> ================
> Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:3218
> @@ -3172,1 +3217,3 @@
> 
> +  if (EmitSummaryIndex)
> +    Stream.enableHash(true);
> ----------------
> Please make this conditional so it isn't generating a hash if we aren't using a module cache.
> 
> ================
> Comment at: lib/Bitcode/Writer/BitcodeWriter.cpp:3276
> @@ -3227,1 +3275,3 @@
> 
> +  if (EmitSummaryIndex) {
> +    writeModuleHash(Stream);
> ----------------
> ditto.
> 
> ================
> Comment at: test/Bitcode/Inputs/module_hash.ll:5
> @@ +4,1 @@
> +}
> \ No newline at end of file
> 
> ----------------
> Fix?
> 
> ================
> Comment at: test/Bitcode/module_hash.ll:6
> @@ +5,3 @@
> +; RUN: llvm-bcanalyzer -dump %t | FileCheck %s --check-prefix=MOD1
> +; MOD1: <HASH op0=4294068846 op1=628906511 op2=4294967218 op3=4294967275 op4=4294967289/>
> +; RUN: llvm-bcanalyzer -dump %t2 | FileCheck %s --check-prefix=MOD2
> ----------------
> Won't these values change frequently as the compiler changes and generates slightly different bitcode? How about just checking if the per module values match the combined index values via FileCheck variables/pattern matching?
> 
> ================
> Comment at: test/Bitcode/module_hash.ll:20
> @@ +19,1 @@
> +}
> \ No newline at end of file
> 
> ----------------
> Fix?
> 
> 
> http://reviews.llvm.org/D18213
> 
> 
>