[PATCH] D12536: Function bitcode index in Value Symbol Table and lazy reading support
Teresa Johnson via llvm-commits
llvm-commits at lists.llvm.org
Tue Sep 15 18:34:35 PDT 2015
+cc llvm-commits (which somehow got dropped despite being a subscriber).
Teresa
On Wed, Sep 9, 2015 at 9:14 AM, Teresa Johnson <tejohnson at google.com> wrote:
> On Tue, Sep 8, 2015 at 11:40 AM, Duncan P. N. Exon Smith
> <dexonsmith at apple.com> wrote:
>>
>>> On 2015-Sep-08, at 06:43, Teresa Johnson <tejohnson at google.com> wrote:
>>>
>>> tejohnson added a subscriber: tejohnson.
>>> tejohnson added a comment.
>>>
>>> Ping.
>>>
>>> Please let me know if you have any other comments or if it looks ready to go in.
>>>
>>> Thanks,
>>> Teresa
>>>
>>>
>>> http://reviews.llvm.org/D12536
>>
>> This looks good at a high-level; just a few low-level comments. Besides
>> a few nitpicks, there are a few things that I think should be split out
>> into separate commits (most of them NFC (no functionality change)).
>
> Thanks, will do.
>
> Responses below.
>
>>
>> LGTM after that.
>>
>>> Index: test/Bitcode/vst-forward-declaration.ll
>>> ===================================================================
>>> --- /dev/null
>>> +++ test/Bitcode/vst-forward-declaration.ll
>>> @@ -0,0 +1,38 @@
>>> +; RUN: llvm-as < %s | llvm-bcanalyzer -dump | FileCheck %s -check-prefix=BC
>>> +; Check for VST forward declaration record and VST function index records.
>>> +
>>> +; BC: <VSTOFFSET
>>> +; BC: <FNENTRY
>>> +; BC: <FNENTRY
>>> +
>>> +; RUN: llvm-as < %s | llvm-dis | FileCheck %s
>>> +; Check that this round-trips correctly.
>>> +
>>> +; ModuleID = '<stdin>'
>>> +target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
>>> +target triple = "x86_64-unknown-linux-gnu"
>>> +
>>> +; CHECK: define i32 @foo()
>>> +
>>> +; Function Attrs: nounwind uwtable
>>> +define i32 @foo() #0 {
>>> +entry:
>>> + ret i32 1
>>> +}
>>> +
>>> +; CHECK: define i32 @bar(i32 %x)
>>> +
>>> +; Function Attrs: nounwind uwtable
>>> +define i32 @bar(i32 %x) #0 {
>>> +entry:
>>> + %x.addr = alloca i32, align 4
>>> + store i32 %x, i32* %x.addr, align 4
>>> + %0 = load i32, i32* %x.addr, align 4
>>> + ret i32 %0
>>> +}
>>> +
>>> +attributes #0 = { nounwind uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+sse,+sse2" "unsafe-fp-math"="false" "use-soft-float"="false" }
>>
>> Can you reduce this file a little more? It seems like you're not
>> checking for attributes or anything to do with function bodies, so you
>> might be able to reduce to single `unreachable` instructions and strip
>> all the attributes.
>>
>>> +
>>> +!llvm.ident = !{!0}
>>> +
>>> +!0 = !{!"clang version 3.8.0 (trunk 246546) (llvm/trunk 246542)"}
>>
>> Probably you don't need this either?
>
> I've reduced this test by stripping the attributes and metadata, and
> making the functions simply ret.
>
>>
>>> Index: lib/Bitcode/Writer/BitcodeWriter.cpp
>>> ===================================================================
>>> --- lib/Bitcode/Writer/BitcodeWriter.cpp
>>> +++ lib/Bitcode/Writer/BitcodeWriter.cpp
>>> @@ -574,10 +574,40 @@
>>> }
>>> }
>>>
>>> +// Write a record that will eventually hold the word offset of the
>>> +// module-level VST. For now the offset is 0, which will be backpatched
>>> +// after the real VST is written. Returns the bit offset to backpatch.
>>> +static uint64_t WriteValueSymbolTableForwardDecl(const ValueSymbolTable &VST,
>>> + BitstreamWriter &Stream) {
>>> + if (VST.empty()) return 0;
>>> +
>>> + // Write a placeholder value in for the offset of the real VST,
>>> + // which is written after the function blocks so that it can include
>>> + // the offset of each function. The placeholder offset will be
>>> + // updated when the real VST is written.
>>> + BitCodeAbbrev *Abbv = new BitCodeAbbrev();
>>> + Abbv->Add(BitCodeAbbrevOp(bitc::MODULE_CODE_VSTOFFSET));
>>> + // Blocks are 32-bit aligned, so we can use a 32-bit word offset to
>>> + // hold the real VST offset. Must use fixed instead of VBR as we don't
>>> + // know how many VBR chunks to reserve ahead of time.
>>> + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 32));
>>> + unsigned VSTOffsetAbbrev = Stream.EmitAbbrev(Abbv);
>>> +
>>> + // Emit the placeholder
>>> + SmallVector<unsigned, 2> Vals{llvm::bitc::MODULE_CODE_VSTOFFSET, 0};
>>
>> I don't remember seeing brace initializers much in our codebase; I can't
>> remember whether that's a style thing, or an MSVC limitation, but I'm
>> wary of them.
>>
>> But in this case, it feels like you should be able to use:
>>
>> unsigned Vals[] = {llvm::bitc::MODULE_CODE_VSTOFFSET, 0};
>>
>> except...
>>
>>> + Stream.EmitRecordWithAbbrev(VSTOffsetAbbrev, Vals);
>>
>> ... this seems to take a SmallVectorImpl :(. Not for a good reason,
>> though, AFAICT.
>>
>> Can you try (as a separate NFC commit or series thereof) converting
>> `EmitRecordWithAbbrevImpl()` and `EmitRecordWith*()` to take an
>> `ArrayRef<uintty>` instead of `SmallVectorImpl<uintty>&`?
>>
>> (If I've read the code wrong and ArrayRef won't work there (and can't be
>> munged to work there), what you have LGTM, ideally after removing the
>> brace initialization.)
>>
>>> +
>>> + // Compute and return the bit offset to the placeholder, which will be
>>> + // patched when the real VST is written. We can simply subtract the 32-bit
>>> + // fixed size from the current bit number to get the location to backpatch.
>>> + return Stream.GetCurrentBitNo() - 32;
>>> +}
>>> +
>>> // Emit top-level description of module, including target triple, inline asm,
>>> // descriptors for global variables, and function prototype info.
>>> -static void WriteModuleInfo(const Module *M, const ValueEnumerator &VE,
>>> - BitstreamWriter &Stream) {
>>> +// Returns the bit offset to backpatch with the location of the real VST.
>>> +static uint64_t WriteModuleInfo(const Module *M, const ValueEnumerator &VE,
>>> + BitstreamWriter &Stream) {
>>> // Emit various pieces of data attached to a module.
>>> if (!M->getTargetTriple().empty())
>>> WriteStringRecord(bitc::MODULE_CODE_TRIPLE, M->getTargetTriple(),
>>> @@ -2065,50 +2099,148 @@
>>> Vals.clear();
>>> }
>>>
>>> -// Emit names for globals/functions etc.
>>> -static void WriteValueSymbolTable(const ValueSymbolTable &VST,
>>> - const ValueEnumerator &VE,
>>> - BitstreamWriter &Stream) {
>>> - if (VST.empty()) return;
>>> +// Figure out the encoding to use for the given string name and length.
>>
>> Please use "///" to expose this in doxygen.
>
> I assume this is for StringEncodingBits (now getStringEncoding)? Done.
>
>>
>>> +unsigned StringEncodingBits(const char *Str, unsigned StrLen) {
>>
>> Shouldn't this be `static`?
>
> Yes, done.
>
>>
>> Can you split this (and its use) out as a separate NFC commit?
>>
>> Also, please change the function name to a verb (and start with a
>> lowercase); something like "getNumBitsForStringEncoding()". That's
>> probably a terrible name, feel free to improve it.
>>
>>> + bool is7Bit = true;
>>> + bool isChar6 = true;
>>> + for (const char *C = Str, *E = C+StrLen; C != E; ++C) {
>>> + if (isChar6)
>>> + isChar6 = BitCodeAbbrevOp::isChar6(*C);
>>> + if ((unsigned char)*C & 128) {
>>> + is7Bit = false;
>>> + break; // don't bother scanning the rest.
>>> + }
>>> + }
>>> + if (isChar6)
>>> + return 6;
>>> + else if (is7Bit)
>>> + return 7;
>>> + else
>>> + return 8;
>>
>> For that matter, should this be returning an `enum`? It seems like it
>> might be cleaner. Then you could call this "getStringEncoding()" or
>> something?
>
> Ok, I am changing this to use an enum and getStringEncoding name. Will
> commit this separately as NFC.
>
>>
>>> +}
>>> +
>>> +// Emit names for globals/functions etc. The VSTOffsetPlaceholder,
>>> +// BitcodeStartBit and FunctionIndex are only passed for the module-level
>>> +// VST, where we are including a function bitcode index and need to
>>> +// backpatch the VST forward declaration record.
>>
>> Please expose this in doxygen too ("///").
>
> Done.
>
>>
>>> +static void WriteValueSymbolTable(
>>> + const ValueSymbolTable &VST, const ValueEnumerator &VE,
>>> + BitstreamWriter &Stream, uint64_t VSTOffsetPlaceholder = 0,
>>> + uint64_t BitcodeStartBit = 0,
>>> + DenseMap<const Function *, uint64_t> *FunctionIndex = nullptr) {
>>> + if (VST.empty()) {
>>> + // WriteValueSymbolTableForwardDecl should have returned early as
>>> + // well. Ensure this handling remains in sync by asserting that
>>> + // the placeholder offset is not set.
>>> + assert(VSTOffsetPlaceholder == 0);
>>> + return;
>>> + }
>>> +
>>> + if (VSTOffsetPlaceholder > 0) {
>>> + // Get the offset of the VST we are writing, and backpatch it into
>>> + // the VST forward declaration record.
>>> + uint64_t VSTOffset = Stream.GetCurrentBitNo();
>>> + // The BitcodeStartBit was the stream offset of the actual bitcode
>>> + // (e.g. excluding any initial darwin header).
>>> + VSTOffset -= BitcodeStartBit;
>>> + assert((VSTOffset & 31) == 0 && "VST block not 32-bit aligned");
>>> + Stream.BackpatchWord(VSTOffsetPlaceholder, VSTOffset / 32);
>>> + }
>>> +
>>> Stream.EnterSubblock(bitc::VALUE_SYMTAB_BLOCK_ID, 4);
>>>
>>> + // For the module-level VST, add abbrev Ids for the VST_CODE_FNENTRY
>>> + // records, which are not used in the per-function VSTs.
>>> + unsigned FnEntry8BitAbbrev;
>>> + unsigned FnEntry7BitAbbrev;
>>> + unsigned FnEntry6BitAbbrev;
>>> + if (VSTOffsetPlaceholder > 0) {
>>> + // 8-bit fixed-width VST_FNENTRY function strings.
>>> + BitCodeAbbrev *Abbv = new BitCodeAbbrev();
>>> + Abbv->Add(BitCodeAbbrevOp(bitc::VST_CODE_FNENTRY));
>>> + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // value id
>>> + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // funcoffset
>>> + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array));
>>> + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 8));
>>> + FnEntry8BitAbbrev = Stream.EmitAbbrev(Abbv);
>>> +
>>> + // 7-bit fixed width VST_FNENTRY function strings.
>>> + Abbv = new BitCodeAbbrev();
>>> + Abbv->Add(BitCodeAbbrevOp(bitc::VST_CODE_FNENTRY));
>>> + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // value id
>>> + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // funcoffset
>>> + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array));
>>> + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 7));
>>> + FnEntry7BitAbbrev = Stream.EmitAbbrev(Abbv);
>>> +
>>> + // 6-bit char6 VST_FNENTRY function strings.
>>> + Abbv = new BitCodeAbbrev();
>>> + Abbv->Add(BitCodeAbbrevOp(bitc::VST_CODE_FNENTRY));
>>> + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // value id
>>> + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // funcoffset
>>> + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array));
>>> + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Char6));
>>> + FnEntry6BitAbbrev = Stream.EmitAbbrev(Abbv);
>>> + }
>>> +
>>> // FIXME: Set up the abbrev, we know how many values there are!
>>> // FIXME: We know if the type names can use 7-bit ascii.
>>> SmallVector<unsigned, 64> NameVals;
>>>
>>> for (const ValueName &Name : VST) {
>>> -
>>> // Figure out the encoding to use for the name.
>>> - bool is7Bit = true;
>>> - bool isChar6 = true;
>>> - for (const char *C = Name.getKeyData(), *E = C+Name.getKeyLength();
>>> - C != E; ++C) {
>>> - if (isChar6)
>>> - isChar6 = BitCodeAbbrevOp::isChar6(*C);
>>> - if ((unsigned char)*C & 128) {
>>> - is7Bit = false;
>>> - break; // don't bother scanning the rest.
>>> - }
>>> - }
>>> + unsigned Bits = StringEncodingBits(Name.getKeyData(), Name.getKeyLength());
>>> + bool is7Bit = Bits == 7;
>>> + bool isChar6 = Bits == 6;
>>
>> If `Bits` gets changed to an `enum`, then it might be cleanest to remove
>> these `bool`s entirely and directly compare against enum values?
>
> Yep, will do as part of the enum change.
>
>>
>>>
>>> unsigned AbbrevToUse = VST_ENTRY_8_ABBREV;
>>> + NameVals.push_back(VE.getValueID(Name.getValue()));
>>> +
>>> + Function *F = dyn_cast<Function>(Name.getValue());
>>> + if (!F) {
>>> + // If value is an alias, need to get the aliased base object to
>>> + // see if it is a function.
>>> + auto *GA = dyn_cast<GlobalAlias>(Name.getValue());
>>> + if (GA && GA->getBaseObject())
>>> + F = dyn_cast<Function>(GA->getBaseObject());
>>> + }
>>>
>>> // VST_ENTRY: [valueid, namechar x N]
>>> + // VST_FNENTRY: [valueid, funcoffset, namechar x N]
>>> // VST_BBENTRY: [bbid, namechar x N]
>>> unsigned Code;
>>> if (isa<BasicBlock>(Name.getValue())) {
>>> Code = bitc::VST_CODE_BBENTRY;
>>> if (isChar6)
>>> AbbrevToUse = VST_BBENTRY_6_ABBREV;
>>> + } else if (F && !F->isDeclaration()) {
>>> + // Must be the module-level VST, where we pass in the Index and
>>> + // have a VSTOffsetPlaceholder. The function-level VST should not
>>> + // contain any Function symbols.
>>> + assert(FunctionIndex);
>>> + assert(VSTOffsetPlaceholder > 0);
>>> +
>>> + // Save the word offset of the function (from the start of the
>>> + // actual bitcode written to the stream).
>>> + assert(FunctionIndex->count(F) == 1);
>>> + uint64_t BitcodeIndex = (*FunctionIndex)[F] - BitcodeStartBit;
>>> + assert((BitcodeIndex & 31) == 0 && "function block not 32-bit aligned");
>>> + NameVals.push_back(BitcodeIndex / 32);
>>> +
>>> + Code = bitc::VST_CODE_FNENTRY;
>>> + AbbrevToUse = FnEntry8BitAbbrev;
>>> + if (isChar6)
>>> + AbbrevToUse = FnEntry6BitAbbrev;
>>> + else if (is7Bit)
>>> + AbbrevToUse = FnEntry7BitAbbrev;
>>> } else {
>>> Code = bitc::VST_CODE_ENTRY;
>>> if (isChar6)
>>> AbbrevToUse = VST_ENTRY_6_ABBREV;
>>> else if (is7Bit)
>>> AbbrevToUse = VST_ENTRY_7_ABBREV;
>>> }
>>>
>>> - NameVals.push_back(VE.getValueID(Name.getValue()));
>>> for (const char *P = Name.getKeyData(),
>>> *E = Name.getKeyData()+Name.getKeyLength(); P != E; ++P)
>>> NameVals.push_back((unsigned char)*P);
>>> Index: lib/Bitcode/Reader/BitcodeReader.cpp
>>> ===================================================================
>>> --- lib/Bitcode/Reader/BitcodeReader.cpp
>>> +++ lib/Bitcode/Reader/BitcodeReader.cpp
>>> @@ -1601,6 +1656,9 @@
>>> case BitstreamEntry::Error:
>>> return error("Malformed block");
>>> case BitstreamEntry::EndBlock:
>>> + if (Offset > 0) {
>>> + Stream.JumpToBit(CurrentBit);
>>> + }
>>
>> (We usually skip braces on single-line `if`s.)
>
> done.
>
>>
>>> return std::error_code();
>>> case BitstreamEntry::Record:
>>> // The interesting case.
>>> Index: include/llvm/Bitcode/BitstreamWriter.h
>>> ===================================================================
>>> --- include/llvm/Bitcode/BitstreamWriter.h
>>> +++ include/llvm/Bitcode/BitstreamWriter.h
>>> @@ -61,12 +61,6 @@
>>> };
>>> std::vector<BlockInfo> BlockInfoRecords;
>>>
>>> - // BackpatchWord - Backpatch a 32-bit word in the output with the specified
>>> - // value.
>>> - void BackpatchWord(unsigned ByteNo, unsigned NewWord) {
>>> - support::endian::write32le(&Out[ByteNo], NewWord);
>>> - }
>>> -
>>> void WriteByte(unsigned char Value) {
>>> Out.push_back(Value);
>>> }
>>> @@ -103,6 +97,34 @@
>>> // Basic Primitives for emitting bits to the stream.
>>> //===--------------------------------------------------------------------===//
>>>
>>> + // BackpatchWord - Backpatch a 32-bit word in the output with the specified
>>> + // value.
>>> + void BackpatchWord(uint64_t BitNo, unsigned NewWord) {
>>
>> Why did you move this down the file? If it really belongs down here,
>> please move it in a separate NFC commit (ideally, also change fix the
>> comment (remove the "RepeatedName -" and expose to doxygen)).
>
> Moved it down to the public section since it is now called from
> BitcodeWriter.cpp. Comment fixed.
>
>>
>>> + unsigned ByteNo = BitNo / 32 * 4;
>>> + if ((BitNo & 31) == 0) {
>>> + // Already 32-bit aligned
>>> + support::endian::write32le(&Out[ByteNo], NewWord);
>>
>> Does the fast path work for anything that's 8-bit aligned? (I'm just
>> guessing based on the change you made to the caller.)
>
> I assume you are asking whether the write32le works if it is 8-bit
> aligned if we changed the guard here. Checking...
>
>>
>>> + } else {
>>> + // First patch in the lower 32-(BitNo&32) bits of NewWord, by
>>> + // shifting and masking into bits BitNo&32 to 32 of the first byte.
>>> + unsigned CurWord = support::endian::read32le(&Out[ByteNo]);
>>> + unsigned StartBit = BitNo & 31;
>>> + // Currently expect to backpatch 0-value placeholders.
>>> + assert((CurWord >> StartBit) == 0);
>>> + CurWord |= NewWord << StartBit;
>>> + support::endian::write32le(&Out[ByteNo], CurWord);
>>> +
>>> + // Next patch in the upper BitNo&32 bits of NewWord, by
>>> + // shifting and masking into bits 0 to (BitNo&32)-1 of the second byte.
>>> + ByteNo += 4;
>>> + CurWord = support::endian::read32le(&Out[ByteNo]);
>>> + // Currently expect to backpatch 0-value placeholders.
>>> + assert((CurWord << (32 - StartBit)) == 0);
>>> + CurWord |= NewWord >> (32 - StartBit);
>>> + support::endian::write32le(&Out[ByteNo], CurWord);
>>
>> It's seems weird to have this logic here for unaligned writes. Is there
>> no functionality for this in llvm/Support? If not, can you add it in a
>> separate commit (with unit tests)? (If so, just use it.)
>
> Checking if this is needed...
>
>>
>>> + }
>>> + }
>>> +
>>> void Emit(uint32_t Val, unsigned NumBits) {
>>> assert(NumBits && NumBits <= 32 && "Invalid value size!");
>>> assert((Val & ~(~0U >> (32-NumBits))) == 0 && "High bits set!");
>>> @@ -119,7 +141,7 @@
>>> CurValue = Val >> (32-CurBit);
>>> else
>>> CurValue = 0;
>>> - CurBit = (CurBit+NumBits) & 31;
>>> + CurBit = (CurBit + NumBits) & 31;
>>
>> If you commit this, please do it in a separate NFC commit (it's noise
>> for this patch).
>
> Ok, looks like this was a change to existing code by clang-format that
> snuck in. Will go ahead and commit it NFC.
>
> This brings up a related question - when I used clang-format on my
> patch it made tons of changes to unrelated code. In those situations
> is it good to do an NFC commit of the clang-format changes on the
> current file first? Otherwise it was a pain to use clang-format for my
> patch since I didn't want any of the changes to existing code.
>
>>
>>> }
>>>
>>> void Emit64(uint64_t Val, unsigned NumBits) {
>>> @@ -232,10 +254,10 @@
>>>
>>> // Compute the size of the block, in words, not counting the size field.
>>> unsigned SizeInWords = GetWordIndex() - B.StartSizeWord - 1;
>>> - unsigned ByteNo = B.StartSizeWord*4;
>>> + uint64_t BitNo = B.StartSizeWord * 32;
>>>
>>> // Update the block size field in the header of this sub-block.
>>> - BackpatchWord(ByteNo, SizeInWords);
>>> + BackpatchWord(BitNo, SizeInWords);
>>>
>>> // Restore the inner block's code size and abbrev table.
>>> CurCodeSize = B.PrevCodeSize;
>>>
>>
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
--
Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
More information about the llvm-commits
mailing list