[PATCH] D12536: Function bitcode index in Value Symbol Table and lazy reading support
Teresa Johnson via llvm-commits
llvm-commits at lists.llvm.org
Tue Sep 15 18:31:24 PDT 2015
+cc llvm-commits (which somehow got dropped despite being a
subscriber). I will resend some of the follow-on threads as well.
Teresa
On Tue, Sep 8, 2015 at 11:40 AM, Duncan P. N. Exon Smith
<dexonsmith at apple.com> wrote:
>
>> On 2015-Sep-08, at 06:43, Teresa Johnson <tejohnson at google.com> wrote:
>>
>> tejohnson added a subscriber: tejohnson.
>> tejohnson added a comment.
>>
>> Ping.
>>
>> Please let me know if you have any other comments or if it looks ready to go in.
>>
>> Thanks,
>> Teresa
>>
>>
>> http://reviews.llvm.org/D12536
>
> This looks good at a high-level; just a few low-level comments. Besides
> a few nitpicks, there are a few things that I think should be split out
> into separate commits (most of them NFC (no functionality change)).
>
> LGTM after that.
>
>> Index: test/Bitcode/vst-forward-declaration.ll
>> ===================================================================
>> --- /dev/null
>> +++ test/Bitcode/vst-forward-declaration.ll
>> @@ -0,0 +1,38 @@
>> +; RUN: llvm-as < %s | llvm-bcanalyzer -dump | FileCheck %s -check-prefix=BC
>> +; Check for VST forward declaration record and VST function index records.
>> +
>> +; BC: <VSTOFFSET
>> +; BC: <FNENTRY
>> +; BC: <FNENTRY
>> +
>> +; RUN: llvm-as < %s | llvm-dis | FileCheck %s
>> +; Check that this round-trips correctly.
>> +
>> +; ModuleID = '<stdin>'
>> +target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
>> +target triple = "x86_64-unknown-linux-gnu"
>> +
>> +; CHECK: define i32 @foo()
>> +
>> +; Function Attrs: nounwind uwtable
>> +define i32 @foo() #0 {
>> +entry:
>> + ret i32 1
>> +}
>> +
>> +; CHECK: define i32 @bar(i32 %x)
>> +
>> +; Function Attrs: nounwind uwtable
>> +define i32 @bar(i32 %x) #0 {
>> +entry:
>> + %x.addr = alloca i32, align 4
>> + store i32 %x, i32* %x.addr, align 4
>> + %0 = load i32, i32* %x.addr, align 4
>> + ret i32 %0
>> +}
>> +
>> +attributes #0 = { nounwind uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+sse,+sse2" "unsafe-fp-math"="false" "use-soft-float"="false" }
>
> Can you reduce this file a little more? It seems like you're not
> checking for attributes or anything to do with function bodies, so you
> might be able to reduce to single `unreachable` instructions and strip
> all the attributes.
>
>> +
>> +!llvm.ident = !{!0}
>> +
>> +!0 = !{!"clang version 3.8.0 (trunk 246546) (llvm/trunk 246542)"}
>
> Probably you don't need this either?
>
>> Index: lib/Bitcode/Writer/BitcodeWriter.cpp
>> ===================================================================
>> --- lib/Bitcode/Writer/BitcodeWriter.cpp
>> +++ lib/Bitcode/Writer/BitcodeWriter.cpp
>> @@ -574,10 +574,40 @@
>> }
>> }
>>
>> +// Write a record that will eventually hold the word offset of the
>> +// module-level VST. For now the offset is 0, which will be backpatched
>> +// after the real VST is written. Returns the bit offset to backpatch.
>> +static uint64_t WriteValueSymbolTableForwardDecl(const ValueSymbolTable &VST,
>> + BitstreamWriter &Stream) {
>> + if (VST.empty()) return 0;
>> +
>> + // Write a placeholder value in for the offset of the real VST,
>> + // which is written after the function blocks so that it can include
>> + // the offset of each function. The placeholder offset will be
>> + // updated when the real VST is written.
>> + BitCodeAbbrev *Abbv = new BitCodeAbbrev();
>> + Abbv->Add(BitCodeAbbrevOp(bitc::MODULE_CODE_VSTOFFSET));
>> + // Blocks are 32-bit aligned, so we can use a 32-bit word offset to
>> + // hold the real VST offset. Must use fixed instead of VBR as we don't
>> + // know how many VBR chunks to reserve ahead of time.
>> + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 32));
>> + unsigned VSTOffsetAbbrev = Stream.EmitAbbrev(Abbv);
>> +
>> + // Emit the placeholder
>> + SmallVector<unsigned, 2> Vals{llvm::bitc::MODULE_CODE_VSTOFFSET, 0};
>
> I don't remember seeing brace initializers much in our codebase; I can't
> remember whether that's a style thing, or an MSVC limitation, but I'm
> wary of them.
>
> But in this case, it feels like you should be able to use:
>
> unsigned Vals[] = {llvm::bitc::MODULE_CODE_VSTOFFSET, 0};
>
> except...
>
>> + Stream.EmitRecordWithAbbrev(VSTOffsetAbbrev, Vals);
>
> ... this seems to take a SmallVectorImpl :(. Not for a good reason,
> though, AFAICT.
>
> Can you try (as a separate NFC commit or series thereof) converting
> `EmitRecordWithAbbrevImpl()` and `EmitRecordWith*()` to take an
> `ArrayRef<uintty>` instead of `SmallVectorImpl<uintty>&`?
>
> (If I've read the code wrong and ArrayRef won't work there (and can't be
> munged to work there), what you have LGTM, ideally after removing the
> brace initialization.)
>
>> +
>> + // Compute and return the bit offset to the placeholder, which will be
>> + // patched when the real VST is written. We can simply subtract the 32-bit
>> + // fixed size from the current bit number to get the location to backpatch.
>> + return Stream.GetCurrentBitNo() - 32;
>> +}
>> +
>> // Emit top-level description of module, including target triple, inline asm,
>> // descriptors for global variables, and function prototype info.
>> -static void WriteModuleInfo(const Module *M, const ValueEnumerator &VE,
>> - BitstreamWriter &Stream) {
>> +// Returns the bit offset to backpatch with the location of the real VST.
>> +static uint64_t WriteModuleInfo(const Module *M, const ValueEnumerator &VE,
>> + BitstreamWriter &Stream) {
>> // Emit various pieces of data attached to a module.
>> if (!M->getTargetTriple().empty())
>> WriteStringRecord(bitc::MODULE_CODE_TRIPLE, M->getTargetTriple(),
>> @@ -2065,50 +2099,148 @@
>> Vals.clear();
>> }
>>
>> -// Emit names for globals/functions etc.
>> -static void WriteValueSymbolTable(const ValueSymbolTable &VST,
>> - const ValueEnumerator &VE,
>> - BitstreamWriter &Stream) {
>> - if (VST.empty()) return;
>> +// Figure out the encoding to use for the given string name and length.
>
> Please use "///" to expose this in doxygen.
>
>> +unsigned StringEncodingBits(const char *Str, unsigned StrLen) {
>
> Shouldn't this be `static`?
>
> Can you split this (and its use) out as a separate NFC commit?
>
> Also, please change the function name to a verb (and start with a
> lowercase); something like "getNumBitsForStringEncoding()". That's
> probably a terrible name, feel free to improve it.
>
>> + bool is7Bit = true;
>> + bool isChar6 = true;
>> + for (const char *C = Str, *E = C+StrLen; C != E; ++C) {
>> + if (isChar6)
>> + isChar6 = BitCodeAbbrevOp::isChar6(*C);
>> + if ((unsigned char)*C & 128) {
>> + is7Bit = false;
>> + break; // don't bother scanning the rest.
>> + }
>> + }
>> + if (isChar6)
>> + return 6;
>> + else if (is7Bit)
>> + return 7;
>> + else
>> + return 8;
>
> For that matter, should this be returning an `enum`? It seems like it
> might be cleaner. Then you could call this "getStringEncoding()" or
> something?
>
>> +}
>> +
>> +// Emit names for globals/functions etc. The VSTOffsetPlaceholder,
>> +// BitcodeStartBit and FunctionIndex are only passed for the module-level
>> +// VST, where we are including a function bitcode index and need to
>> +// backpatch the VST forward declaration record.
>
> Please expose this in doxygen too ("///").
>
>> +static void WriteValueSymbolTable(
>> + const ValueSymbolTable &VST, const ValueEnumerator &VE,
>> + BitstreamWriter &Stream, uint64_t VSTOffsetPlaceholder = 0,
>> + uint64_t BitcodeStartBit = 0,
>> + DenseMap<const Function *, uint64_t> *FunctionIndex = nullptr) {
>> + if (VST.empty()) {
>> + // WriteValueSymbolTableForwardDecl should have returned early as
>> + // well. Ensure this handling remains in sync by asserting that
>> + // the placeholder offset is not set.
>> + assert(VSTOffsetPlaceholder == 0);
>> + return;
>> + }
>> +
>> + if (VSTOffsetPlaceholder > 0) {
>> + // Get the offset of the VST we are writing, and backpatch it into
>> + // the VST forward declaration record.
>> + uint64_t VSTOffset = Stream.GetCurrentBitNo();
>> + // The BitcodeStartBit was the stream offset of the actual bitcode
>> + // (e.g. excluding any initial darwin header).
>> + VSTOffset -= BitcodeStartBit;
>> + assert((VSTOffset & 31) == 0 && "VST block not 32-bit aligned");
>> + Stream.BackpatchWord(VSTOffsetPlaceholder, VSTOffset / 32);
>> + }
>> +
>> Stream.EnterSubblock(bitc::VALUE_SYMTAB_BLOCK_ID, 4);
>>
>> + // For the module-level VST, add abbrev Ids for the VST_CODE_FNENTRY
>> + // records, which are not used in the per-function VSTs.
>> + unsigned FnEntry8BitAbbrev;
>> + unsigned FnEntry7BitAbbrev;
>> + unsigned FnEntry6BitAbbrev;
>> + if (VSTOffsetPlaceholder > 0) {
>> + // 8-bit fixed-width VST_FNENTRY function strings.
>> + BitCodeAbbrev *Abbv = new BitCodeAbbrev();
>> + Abbv->Add(BitCodeAbbrevOp(bitc::VST_CODE_FNENTRY));
>> + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // value id
>> + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // funcoffset
>> + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array));
>> + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 8));
>> + FnEntry8BitAbbrev = Stream.EmitAbbrev(Abbv);
>> +
>> + // 7-bit fixed width VST_FNENTRY function strings.
>> + Abbv = new BitCodeAbbrev();
>> + Abbv->Add(BitCodeAbbrevOp(bitc::VST_CODE_FNENTRY));
>> + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // value id
>> + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // funcoffset
>> + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array));
>> + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 7));
>> + FnEntry7BitAbbrev = Stream.EmitAbbrev(Abbv);
>> +
>> + // 6-bit char6 VST_FNENTRY function strings.
>> + Abbv = new BitCodeAbbrev();
>> + Abbv->Add(BitCodeAbbrevOp(bitc::VST_CODE_FNENTRY));
>> + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // value id
>> + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // funcoffset
>> + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array));
>> + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Char6));
>> + FnEntry6BitAbbrev = Stream.EmitAbbrev(Abbv);
>> + }
>> +
>> // FIXME: Set up the abbrev, we know how many values there are!
>> // FIXME: We know if the type names can use 7-bit ascii.
>> SmallVector<unsigned, 64> NameVals;
>>
>> for (const ValueName &Name : VST) {
>> -
>> // Figure out the encoding to use for the name.
>> - bool is7Bit = true;
>> - bool isChar6 = true;
>> - for (const char *C = Name.getKeyData(), *E = C+Name.getKeyLength();
>> - C != E; ++C) {
>> - if (isChar6)
>> - isChar6 = BitCodeAbbrevOp::isChar6(*C);
>> - if ((unsigned char)*C & 128) {
>> - is7Bit = false;
>> - break; // don't bother scanning the rest.
>> - }
>> - }
>> + unsigned Bits = StringEncodingBits(Name.getKeyData(), Name.getKeyLength());
>> + bool is7Bit = Bits == 7;
>> + bool isChar6 = Bits == 6;
>
> If `Bits` gets changed to an `enum`, then it might be cleanest to remove
> these `bool`s entirely and directly compare against enum values?
>
>>
>> unsigned AbbrevToUse = VST_ENTRY_8_ABBREV;
>> + NameVals.push_back(VE.getValueID(Name.getValue()));
>> +
>> + Function *F = dyn_cast<Function>(Name.getValue());
>> + if (!F) {
>> + // If value is an alias, need to get the aliased base object to
>> + // see if it is a function.
>> + auto *GA = dyn_cast<GlobalAlias>(Name.getValue());
>> + if (GA && GA->getBaseObject())
>> + F = dyn_cast<Function>(GA->getBaseObject());
>> + }
>>
>> // VST_ENTRY: [valueid, namechar x N]
>> + // VST_FNENTRY: [valueid, funcoffset, namechar x N]
>> // VST_BBENTRY: [bbid, namechar x N]
>> unsigned Code;
>> if (isa<BasicBlock>(Name.getValue())) {
>> Code = bitc::VST_CODE_BBENTRY;
>> if (isChar6)
>> AbbrevToUse = VST_BBENTRY_6_ABBREV;
>> + } else if (F && !F->isDeclaration()) {
>> + // Must be the module-level VST, where we pass in the Index and
>> + // have a VSTOffsetPlaceholder. The function-level VST should not
>> + // contain any Function symbols.
>> + assert(FunctionIndex);
>> + assert(VSTOffsetPlaceholder > 0);
>> +
>> + // Save the word offset of the function (from the start of the
>> + // actual bitcode written to the stream).
>> + assert(FunctionIndex->count(F) == 1);
>> + uint64_t BitcodeIndex = (*FunctionIndex)[F] - BitcodeStartBit;
>> + assert((BitcodeIndex & 31) == 0 && "function block not 32-bit aligned");
>> + NameVals.push_back(BitcodeIndex / 32);
>> +
>> + Code = bitc::VST_CODE_FNENTRY;
>> + AbbrevToUse = FnEntry8BitAbbrev;
>> + if (isChar6)
>> + AbbrevToUse = FnEntry6BitAbbrev;
>> + else if (is7Bit)
>> + AbbrevToUse = FnEntry7BitAbbrev;
>> } else {
>> Code = bitc::VST_CODE_ENTRY;
>> if (isChar6)
>> AbbrevToUse = VST_ENTRY_6_ABBREV;
>> else if (is7Bit)
>> AbbrevToUse = VST_ENTRY_7_ABBREV;
>> }
>>
>> - NameVals.push_back(VE.getValueID(Name.getValue()));
>> for (const char *P = Name.getKeyData(),
>> *E = Name.getKeyData()+Name.getKeyLength(); P != E; ++P)
>> NameVals.push_back((unsigned char)*P);
>> Index: lib/Bitcode/Reader/BitcodeReader.cpp
>> ===================================================================
>> --- lib/Bitcode/Reader/BitcodeReader.cpp
>> +++ lib/Bitcode/Reader/BitcodeReader.cpp
>> @@ -1601,6 +1656,9 @@
>> case BitstreamEntry::Error:
>> return error("Malformed block");
>> case BitstreamEntry::EndBlock:
>> + if (Offset > 0) {
>> + Stream.JumpToBit(CurrentBit);
>> + }
>
> (We usually skip braces on single-line `if`s.)
>
>> return std::error_code();
>> case BitstreamEntry::Record:
>> // The interesting case.
>> Index: include/llvm/Bitcode/BitstreamWriter.h
>> ===================================================================
>> --- include/llvm/Bitcode/BitstreamWriter.h
>> +++ include/llvm/Bitcode/BitstreamWriter.h
>> @@ -61,12 +61,6 @@
>> };
>> std::vector<BlockInfo> BlockInfoRecords;
>>
>> - // BackpatchWord - Backpatch a 32-bit word in the output with the specified
>> - // value.
>> - void BackpatchWord(unsigned ByteNo, unsigned NewWord) {
>> - support::endian::write32le(&Out[ByteNo], NewWord);
>> - }
>> -
>> void WriteByte(unsigned char Value) {
>> Out.push_back(Value);
>> }
>> @@ -103,6 +97,34 @@
>> // Basic Primitives for emitting bits to the stream.
>> //===--------------------------------------------------------------------===//
>>
>> + // BackpatchWord - Backpatch a 32-bit word in the output with the specified
>> + // value.
>> + void BackpatchWord(uint64_t BitNo, unsigned NewWord) {
>
> Why did you move this down the file? If it really belongs down here,
> please move it in a separate NFC commit (ideally, also change fix the
> comment (remove the "RepeatedName -" and expose to doxygen)).
>
>> + unsigned ByteNo = BitNo / 32 * 4;
>> + if ((BitNo & 31) == 0) {
>> + // Already 32-bit aligned
>> + support::endian::write32le(&Out[ByteNo], NewWord);
>
> Does the fast path work for anything that's 8-bit aligned? (I'm just
> guessing based on the change you made to the caller.)
>
>> + } else {
>> + // First patch in the lower 32-(BitNo&32) bits of NewWord, by
>> + // shifting and masking into bits BitNo&32 to 32 of the first byte.
>> + unsigned CurWord = support::endian::read32le(&Out[ByteNo]);
>> + unsigned StartBit = BitNo & 31;
>> + // Currently expect to backpatch 0-value placeholders.
>> + assert((CurWord >> StartBit) == 0);
>> + CurWord |= NewWord << StartBit;
>> + support::endian::write32le(&Out[ByteNo], CurWord);
>> +
>> + // Next patch in the upper BitNo&32 bits of NewWord, by
>> + // shifting and masking into bits 0 to (BitNo&32)-1 of the second byte.
>> + ByteNo += 4;
>> + CurWord = support::endian::read32le(&Out[ByteNo]);
>> + // Currently expect to backpatch 0-value placeholders.
>> + assert((CurWord << (32 - StartBit)) == 0);
>> + CurWord |= NewWord >> (32 - StartBit);
>> + support::endian::write32le(&Out[ByteNo], CurWord);
>
> It's seems weird to have this logic here for unaligned writes. Is there
> no functionality for this in llvm/Support? If not, can you add it in a
> separate commit (with unit tests)? (If so, just use it.)
>
>> + }
>> + }
>> +
>> void Emit(uint32_t Val, unsigned NumBits) {
>> assert(NumBits && NumBits <= 32 && "Invalid value size!");
>> assert((Val & ~(~0U >> (32-NumBits))) == 0 && "High bits set!");
>> @@ -119,7 +141,7 @@
>> CurValue = Val >> (32-CurBit);
>> else
>> CurValue = 0;
>> - CurBit = (CurBit+NumBits) & 31;
>> + CurBit = (CurBit + NumBits) & 31;
>
> If you commit this, please do it in a separate NFC commit (it's noise
> for this patch).
>
>> }
>>
>> void Emit64(uint64_t Val, unsigned NumBits) {
>> @@ -232,10 +254,10 @@
>>
>> // Compute the size of the block, in words, not counting the size field.
>> unsigned SizeInWords = GetWordIndex() - B.StartSizeWord - 1;
>> - unsigned ByteNo = B.StartSizeWord*4;
>> + uint64_t BitNo = B.StartSizeWord * 32;
>>
>> // Update the block size field in the header of this sub-block.
>> - BackpatchWord(ByteNo, SizeInWords);
>> + BackpatchWord(BitNo, SizeInWords);
>>
>> // Restore the inner block's code size and abbrev table.
>> CurCodeSize = B.PrevCodeSize;
>>
>
--
Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
More information about the llvm-commits
mailing list