[PATCH] D12536: Function bitcode index in Value Symbol Table and lazy reading support

Tue Sep 15 18:31:24 PDT 2015

+cc llvm-commits  (which somehow got dropped despite being a
subscriber). I will resend some of the follow-on threads as well.
Teresa

On Tue, Sep 8, 2015 at 11:40 AM, Duncan P. N. Exon Smith
<dexonsmith at apple.com> wrote:
>
>> On 2015-Sep-08, at 06:43, Teresa Johnson <tejohnson at google.com> wrote:
>>
>> tejohnson added a subscriber: tejohnson.
>> tejohnson added a comment.
>>
>> Ping.
>>
>> Please let me know if you have any other comments or if it looks ready to go in.
>>
>> Thanks,
>> Teresa
>>
>>
>> http://reviews.llvm.org/D12536
>
> This looks good at a high-level; just a few low-level comments.  Besides
> a few nitpicks, there are a few things that I think should be split out
> into separate commits (most of them NFC (no functionality change)).
>
> LGTM after that.
>
>> Index: test/Bitcode/vst-forward-declaration.ll
>> ===================================================================
>> --- /dev/null
>> +++ test/Bitcode/vst-forward-declaration.ll
>> @@ -0,0 +1,38 @@
>> +; RUN: llvm-as < %s | llvm-bcanalyzer -dump | FileCheck %s -check-prefix=BC
>> +; Check for VST forward declaration record and VST function index records.
>> +
>> +; BC: <VSTOFFSET
>> +; BC: <FNENTRY
>> +; BC: <FNENTRY
>> +
>> +; RUN: llvm-as < %s | llvm-dis | FileCheck %s
>> +; Check that this round-trips correctly.
>> +
>> +; ModuleID = '<stdin>'
>> +target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
>> +target triple = "x86_64-unknown-linux-gnu"
>> +
>> +; CHECK: define i32 @foo()
>> +
>> +; Function Attrs: nounwind uwtable
>> +define i32 @foo() #0 {
>> +entry:
>> +  ret i32 1
>> +}
>> +
>> +; CHECK: define i32 @bar(i32 %x)
>> +
>> +; Function Attrs: nounwind uwtable
>> +define i32 @bar(i32 %x) #0 {
>> +entry:
>> +  %x.addr = alloca i32, align 4
>> +  store i32 %x, i32* %x.addr, align 4
>> +  %0 = load i32, i32* %x.addr, align 4
>> +  ret i32 %0
>> +}
>> +
>> +attributes #0 = { nounwind uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+sse,+sse2" "unsafe-fp-math"="false" "use-soft-float"="false" }
>
> Can you reduce this file a little more?  It seems like you're not
> checking for attributes or anything to do with function bodies, so you
> might be able to reduce to single `unreachable` instructions and strip
> all the attributes.
>
>> +
>> +!llvm.ident = !{!0}
>> +
>> +!0 = !{!"clang version 3.8.0 (trunk 246546) (llvm/trunk 246542)"}
>
> Probably you don't need this either?
>
>> Index: lib/Bitcode/Writer/BitcodeWriter.cpp
>> ===================================================================
>> --- lib/Bitcode/Writer/BitcodeWriter.cpp
>> +++ lib/Bitcode/Writer/BitcodeWriter.cpp
>> @@ -574,10 +574,40 @@
>>    }
>>  }
>>
>> +// Write a record that will eventually hold the word offset of the
>> +// module-level VST. For now the offset is 0, which will be backpatched
>> +// after the real VST is written. Returns the bit offset to backpatch.
>> +static uint64_t WriteValueSymbolTableForwardDecl(const ValueSymbolTable &VST,
>> +                                                 BitstreamWriter &Stream) {
>> +  if (VST.empty()) return 0;
>> +
>> +  // Write a placeholder value in for the offset of the real VST,
>> +  // which is written after the function blocks so that it can include
>> +  // the offset of each function. The placeholder offset will be
>> +  // updated when the real VST is written.
>> +  BitCodeAbbrev *Abbv = new BitCodeAbbrev();
>> +  Abbv->Add(BitCodeAbbrevOp(bitc::MODULE_CODE_VSTOFFSET));
>> +  // Blocks are 32-bit aligned, so we can use a 32-bit word offset to
>> +  // hold the real VST offset. Must use fixed instead of VBR as we don't
>> +  // know how many VBR chunks to reserve ahead of time.
>> +  Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 32));
>> +  unsigned VSTOffsetAbbrev = Stream.EmitAbbrev(Abbv);
>> +
>> +  // Emit the placeholder
>> +  SmallVector<unsigned, 2> Vals{llvm::bitc::MODULE_CODE_VSTOFFSET, 0};
>
> I don't remember seeing brace initializers much in our codebase; I can't
> remember whether that's a style thing, or an MSVC limitation, but I'm
> wary of them.
>
> But in this case, it feels like you should be able to use:
>
>     unsigned Vals[] = {llvm::bitc::MODULE_CODE_VSTOFFSET, 0};
>
> except...
>
>> +  Stream.EmitRecordWithAbbrev(VSTOffsetAbbrev, Vals);
>
> ... this seems to take a SmallVectorImpl :(.  Not for a good reason,
> though, AFAICT.
>
> Can you try (as a separate NFC commit or series thereof) converting
> `EmitRecordWithAbbrevImpl()` and `EmitRecordWith*()` to take an
> `ArrayRef<uintty>` instead of `SmallVectorImpl<uintty>&`?
>
> (If I've read the code wrong and ArrayRef won't work there (and can't be
> munged to work there), what you have LGTM, ideally after removing the
> brace initialization.)
>
>> +
>> +  // Compute and return the bit offset to the placeholder, which will be
>> +  // patched when the real VST is written. We can simply subtract the 32-bit
>> +  // fixed size from the current bit number to get the location to backpatch.
>> +  return Stream.GetCurrentBitNo() - 32;
>> +}
>> +
>>  // Emit top-level description of module, including target triple, inline asm,
>>  // descriptors for global variables, and function prototype info.
>> -static void WriteModuleInfo(const Module *M, const ValueEnumerator &VE,
>> -                            BitstreamWriter &Stream) {
>> +// Returns the bit offset to backpatch with the location of the real VST.
>> +static uint64_t WriteModuleInfo(const Module *M, const ValueEnumerator &VE,
>> +                                BitstreamWriter &Stream) {
>>    // Emit various pieces of data attached to a module.
>>    if (!M->getTargetTriple().empty())
>>      WriteStringRecord(bitc::MODULE_CODE_TRIPLE, M->getTargetTriple(),
>> @@ -2065,50 +2099,148 @@
>>    Vals.clear();
>>  }
>>
>> -// Emit names for globals/functions etc.
>> -static void WriteValueSymbolTable(const ValueSymbolTable &VST,
>> -                                  const ValueEnumerator &VE,
>> -                                  BitstreamWriter &Stream) {
>> -  if (VST.empty()) return;
>> +// Figure out the encoding to use for the given string name and length.
>
> Please use "///" to expose this in doxygen.
>
>> +unsigned StringEncodingBits(const char *Str, unsigned StrLen) {
>
> Shouldn't this be `static`?
>
> Can you split this (and its use) out as a separate NFC commit?
>
> Also, please change the function name to a verb (and start with a
> lowercase); something like "getNumBitsForStringEncoding()".  That's
> probably a terrible name, feel free to improve it.
>
>> +  bool is7Bit = true;
>> +  bool isChar6 = true;
>> +  for (const char *C = Str, *E = C+StrLen; C != E; ++C) {
>> +    if (isChar6)
>> +      isChar6 = BitCodeAbbrevOp::isChar6(*C);
>> +    if ((unsigned char)*C & 128) {
>> +      is7Bit = false;
>> +      break;  // don't bother scanning the rest.
>> +    }
>> +  }
>> +  if (isChar6)
>> +    return 6;
>> +  else if (is7Bit)
>> +    return 7;
>> +  else
>> +    return 8;
>
> For that matter, should this be returning an `enum`?  It seems like it
> might be cleaner.  Then you could call this "getStringEncoding()" or
> something?
>
>> +}
>> +
>> +// Emit names for globals/functions etc. The VSTOffsetPlaceholder,
>> +// BitcodeStartBit and FunctionIndex are only passed for the module-level
>> +// VST, where we are including a function bitcode index and need to
>> +// backpatch the VST forward declaration record.
>
> Please expose this in doxygen too ("///").
>
>> +static void WriteValueSymbolTable(
>> +    const ValueSymbolTable &VST, const ValueEnumerator &VE,
>> +    BitstreamWriter &Stream, uint64_t VSTOffsetPlaceholder = 0,
>> +    uint64_t BitcodeStartBit = 0,
>> +    DenseMap<const Function *, uint64_t> *FunctionIndex = nullptr) {
>> +  if (VST.empty()) {
>> +    // WriteValueSymbolTableForwardDecl should have returned early as
>> +    // well. Ensure this handling remains in sync by asserting that
>> +    // the placeholder offset is not set.
>> +    assert(VSTOffsetPlaceholder == 0);
>> +    return;
>> +  }
>> +
>> +  if (VSTOffsetPlaceholder > 0) {
>> +    // Get the offset of the VST we are writing, and backpatch it into
>> +    // the VST forward declaration record.
>> +    uint64_t VSTOffset = Stream.GetCurrentBitNo();
>> +    // The BitcodeStartBit was the stream offset of the actual bitcode
>> +    // (e.g. excluding any initial darwin header).
>> +    VSTOffset -= BitcodeStartBit;
>> +    assert((VSTOffset & 31) == 0 && "VST block not 32-bit aligned");
>> +    Stream.BackpatchWord(VSTOffsetPlaceholder, VSTOffset / 32);
>> +  }
>> +
>>    Stream.EnterSubblock(bitc::VALUE_SYMTAB_BLOCK_ID, 4);
>>
>> +  // For the module-level VST, add abbrev Ids for the VST_CODE_FNENTRY
>> +  // records, which are not used in the per-function VSTs.
>> +  unsigned FnEntry8BitAbbrev;
>> +  unsigned FnEntry7BitAbbrev;
>> +  unsigned FnEntry6BitAbbrev;
>> +  if (VSTOffsetPlaceholder > 0) {
>> +    // 8-bit fixed-width VST_FNENTRY function strings.
>> +    BitCodeAbbrev *Abbv = new BitCodeAbbrev();
>> +    Abbv->Add(BitCodeAbbrevOp(bitc::VST_CODE_FNENTRY));
>> +    Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // value id
>> +    Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // funcoffset
>> +    Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array));
>> +    Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 8));
>> +    FnEntry8BitAbbrev = Stream.EmitAbbrev(Abbv);
>> +
>> +    // 7-bit fixed width VST_FNENTRY function strings.
>> +    Abbv = new BitCodeAbbrev();
>> +    Abbv->Add(BitCodeAbbrevOp(bitc::VST_CODE_FNENTRY));
>> +    Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // value id
>> +    Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // funcoffset
>> +    Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array));
>> +    Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 7));
>> +    FnEntry7BitAbbrev = Stream.EmitAbbrev(Abbv);
>> +
>> +    // 6-bit char6 VST_FNENTRY function strings.
>> +    Abbv = new BitCodeAbbrev();
>> +    Abbv->Add(BitCodeAbbrevOp(bitc::VST_CODE_FNENTRY));
>> +    Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // value id
>> +    Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // funcoffset
>> +    Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array));
>> +    Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Char6));
>> +    FnEntry6BitAbbrev = Stream.EmitAbbrev(Abbv);
>> +  }
>> +
>>    // FIXME: Set up the abbrev, we know how many values there are!
>>    // FIXME: We know if the type names can use 7-bit ascii.
>>    SmallVector<unsigned, 64> NameVals;
>>
>>    for (const ValueName &Name : VST) {
>> -
>>      // Figure out the encoding to use for the name.
>> -    bool is7Bit = true;
>> -    bool isChar6 = true;
>> -    for (const char *C = Name.getKeyData(), *E = C+Name.getKeyLength();
>> -         C != E; ++C) {
>> -      if (isChar6)
>> -        isChar6 = BitCodeAbbrevOp::isChar6(*C);
>> -      if ((unsigned char)*C & 128) {
>> -        is7Bit = false;
>> -        break;  // don't bother scanning the rest.
>> -      }
>> -    }
>> +    unsigned Bits = StringEncodingBits(Name.getKeyData(), Name.getKeyLength());
>> +    bool is7Bit = Bits == 7;
>> +    bool isChar6 = Bits == 6;
>
> If `Bits` gets changed to an `enum`, then it might be cleanest to remove
> these `bool`s entirely and directly compare against enum values?
>
>>
>>      unsigned AbbrevToUse = VST_ENTRY_8_ABBREV;
>> +    NameVals.push_back(VE.getValueID(Name.getValue()));
>> +
>> +    Function *F = dyn_cast<Function>(Name.getValue());
>> +    if (!F) {
>> +      // If value is an alias, need to get the aliased base object to
>> +      // see if it is a function.
>> +      auto *GA = dyn_cast<GlobalAlias>(Name.getValue());
>> +      if (GA && GA->getBaseObject())
>> +        F = dyn_cast<Function>(GA->getBaseObject());
>> +    }
>>
>>      // VST_ENTRY:   [valueid, namechar x N]
>> +    // VST_FNENTRY: [valueid, funcoffset, namechar x N]
>>      // VST_BBENTRY: [bbid, namechar x N]
>>      unsigned Code;
>>      if (isa<BasicBlock>(Name.getValue())) {
>>        Code = bitc::VST_CODE_BBENTRY;
>>        if (isChar6)
>>          AbbrevToUse = VST_BBENTRY_6_ABBREV;
>> +    } else if (F && !F->isDeclaration()) {
>> +      // Must be the module-level VST, where we pass in the Index and
>> +      // have a VSTOffsetPlaceholder. The function-level VST should not
>> +      // contain any Function symbols.
>> +      assert(FunctionIndex);
>> +      assert(VSTOffsetPlaceholder > 0);
>> +
>> +      // Save the word offset of the function (from the start of the
>> +      // actual bitcode written to the stream).
>> +      assert(FunctionIndex->count(F) == 1);
>> +      uint64_t BitcodeIndex = (*FunctionIndex)[F] - BitcodeStartBit;
>> +      assert((BitcodeIndex & 31) == 0 && "function block not 32-bit aligned");
>> +      NameVals.push_back(BitcodeIndex / 32);
>> +
>> +      Code = bitc::VST_CODE_FNENTRY;
>> +      AbbrevToUse = FnEntry8BitAbbrev;
>> +      if (isChar6)
>> +        AbbrevToUse = FnEntry6BitAbbrev;
>> +      else if (is7Bit)
>> +        AbbrevToUse = FnEntry7BitAbbrev;
>>      } else {
>>        Code = bitc::VST_CODE_ENTRY;
>>        if (isChar6)
>>          AbbrevToUse = VST_ENTRY_6_ABBREV;
>>        else if (is7Bit)
>>          AbbrevToUse = VST_ENTRY_7_ABBREV;
>>      }
>>
>> -    NameVals.push_back(VE.getValueID(Name.getValue()));
>>      for (const char *P = Name.getKeyData(),
>>           *E = Name.getKeyData()+Name.getKeyLength(); P != E; ++P)
>>        NameVals.push_back((unsigned char)*P);
>> Index: lib/Bitcode/Reader/BitcodeReader.cpp
>> ===================================================================
>> --- lib/Bitcode/Reader/BitcodeReader.cpp
>> +++ lib/Bitcode/Reader/BitcodeReader.cpp
>> @@ -1601,6 +1656,9 @@
>>      case BitstreamEntry::Error:
>>        return error("Malformed block");
>>      case BitstreamEntry::EndBlock:
>> +      if (Offset > 0) {
>> +        Stream.JumpToBit(CurrentBit);
>> +      }
>
> (We usually skip braces on single-line `if`s.)
>
>>        return std::error_code();
>>      case BitstreamEntry::Record:
>>        // The interesting case.
>> Index: include/llvm/Bitcode/BitstreamWriter.h
>> ===================================================================
>> --- include/llvm/Bitcode/BitstreamWriter.h
>> +++ include/llvm/Bitcode/BitstreamWriter.h
>> @@ -61,12 +61,6 @@
>>    };
>>    std::vector<BlockInfo> BlockInfoRecords;
>>
>> -  // BackpatchWord - Backpatch a 32-bit word in the output with the specified
>> -  // value.
>> -  void BackpatchWord(unsigned ByteNo, unsigned NewWord) {
>> -    support::endian::write32le(&Out[ByteNo], NewWord);
>> -  }
>> -
>>    void WriteByte(unsigned char Value) {
>>      Out.push_back(Value);
>>    }
>> @@ -103,6 +97,34 @@
>>    // Basic Primitives for emitting bits to the stream.
>>    //===--------------------------------------------------------------------===//
>>
>> +  // BackpatchWord - Backpatch a 32-bit word in the output with the specified
>> +  // value.
>> +  void BackpatchWord(uint64_t BitNo, unsigned NewWord) {
>
> Why did you move this down the file?  If it really belongs down here,
> please move it in a separate NFC commit (ideally, also change fix the
> comment (remove the "RepeatedName -" and expose to doxygen)).
>
>> +    unsigned ByteNo = BitNo / 32 * 4;
>> +    if ((BitNo & 31) == 0) {
>> +      // Already 32-bit aligned
>> +      support::endian::write32le(&Out[ByteNo], NewWord);
>
> Does the fast path work for anything that's 8-bit aligned?  (I'm just
> guessing based on the change you made to the caller.)
>
>> +    } else {
>> +      // First patch in the lower 32-(BitNo&32) bits of NewWord, by
>> +      // shifting and masking into bits BitNo&32 to 32 of the first byte.
>> +      unsigned CurWord = support::endian::read32le(&Out[ByteNo]);
>> +      unsigned StartBit = BitNo & 31;
>> +      // Currently expect to backpatch 0-value placeholders.
>> +      assert((CurWord >> StartBit) == 0);
>> +      CurWord |= NewWord << StartBit;
>> +      support::endian::write32le(&Out[ByteNo], CurWord);
>> +
>> +      // Next patch in the upper BitNo&32 bits of NewWord, by
>> +      // shifting and masking into bits 0 to (BitNo&32)-1 of the second byte.
>> +      ByteNo += 4;
>> +      CurWord = support::endian::read32le(&Out[ByteNo]);
>> +      // Currently expect to backpatch 0-value placeholders.
>> +      assert((CurWord << (32 - StartBit)) == 0);
>> +      CurWord |= NewWord >> (32 - StartBit);
>> +      support::endian::write32le(&Out[ByteNo], CurWord);
>
> It's seems weird to have this logic here for unaligned writes.  Is there
> no functionality for this in llvm/Support?  If not, can you add it in a
> separate commit (with unit tests)?  (If so, just use it.)
>
>> +    }
>> +  }
>> +
>>    void Emit(uint32_t Val, unsigned NumBits) {
>>      assert(NumBits && NumBits <= 32 && "Invalid value size!");
>>      assert((Val & ~(~0U >> (32-NumBits))) == 0 && "High bits set!");
>> @@ -119,7 +141,7 @@
>>        CurValue = Val >> (32-CurBit);
>>      else
>>        CurValue = 0;
>> -    CurBit = (CurBit+NumBits) & 31;
>> +    CurBit = (CurBit + NumBits) & 31;
>
> If you commit this, please do it in a separate NFC commit (it's noise
> for this patch).
>
>>    }
>>
>>    void Emit64(uint64_t Val, unsigned NumBits) {
>> @@ -232,10 +254,10 @@
>>
>>      // Compute the size of the block, in words, not counting the size field.
>>      unsigned SizeInWords = GetWordIndex() - B.StartSizeWord - 1;
>> -    unsigned ByteNo = B.StartSizeWord*4;
>> +    uint64_t BitNo = B.StartSizeWord * 32;
>>
>>      // Update the block size field in the header of this sub-block.
>> -    BackpatchWord(ByteNo, SizeInWords);
>> +    BackpatchWord(BitNo, SizeInWords);
>>
>>      // Restore the inner block's code size and abbrev table.
>>      CurCodeSize = B.PrevCodeSize;
>>
>


-- 
Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413