[llvm] r264409 - Bitcode: Collect all MDString records into a single blob
Duncan P. N. Exon Smith via llvm-commits
llvm-commits at lists.llvm.org
Fri Mar 25 08:31:25 PDT 2016
I reverted this in r264410 due to a bootstrap failure:
http://lab.llvm.org:8080/green/job/clang-stage2-configure-Rlto_build/8302/
> On 2016-Mar-25, at 07:40, Duncan P. N. Exon Smith via llvm-commits <llvm-commits at lists.llvm.org> wrote:
>
> Author: dexonsmith
> Date: Fri Mar 25 09:40:18 2016
> New Revision: 264409
>
> URL: http://llvm.org/viewvc/llvm-project?rev=264409&view=rev
> Log:
> Bitcode: Collect all MDString records into a single blob
>
> Optimize output of MDStrings in bitcode. This emits them in big blocks
> (currently 1024) in a pair of records:
> - BULK_STRING_SIZES: the sizes of the strings in the block, and
> - BULK_STRING_DATA: a single blob, which is the concatenation of all
> the strings.
>
> Inspired by Mehdi's similar patch, http://reviews.llvm.org/D18342, this
> should (a) slightly reduce bitcode size, since there is less record
> overhead, and (b) greatly improve reading speed, since blobs are super
> cheap to deserialize.
>
> I needed to add support for blobs to streaming input to get the test
> suite passing.
> - StreamingMemoryObject::getPointer reads ahead and returns the
> address of the blob.
> - To avoid a possible reallocation of StreamingMemoryObject::Bytes,
> BitstreamCursor::readRecord needs to move the call to JumpToEnd
> forward so that getPointer is the last bitstream operation.
>
> Removed:
> llvm/trunk/test/Bitcode/Inputs/invalid-fixme-streaming-blob.bc
> Modified:
> llvm/trunk/include/llvm/Bitcode/LLVMBitCodes.h
> llvm/trunk/include/llvm/Support/StreamingMemoryObject.h
> llvm/trunk/lib/Bitcode/Reader/BitcodeReader.cpp
> llvm/trunk/lib/Bitcode/Reader/BitstreamReader.cpp
> llvm/trunk/lib/Bitcode/Writer/BitcodeWriter.cpp
> llvm/trunk/lib/Bitcode/Writer/ValueEnumerator.cpp
> llvm/trunk/lib/Bitcode/Writer/ValueEnumerator.h
> llvm/trunk/lib/Support/StreamingMemoryObject.cpp
> llvm/trunk/test/Bitcode/invalid.test
> llvm/trunk/tools/llvm-bcanalyzer/llvm-bcanalyzer.cpp
>
> Modified: llvm/trunk/include/llvm/Bitcode/LLVMBitCodes.h
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Bitcode/LLVMBitCodes.h?rev=264409&r1=264408&r2=264409&view=diff
> ==============================================================================
> --- llvm/trunk/include/llvm/Bitcode/LLVMBitCodes.h (original)
> +++ llvm/trunk/include/llvm/Bitcode/LLVMBitCodes.h Fri Mar 25 09:40:18 2016
> @@ -209,7 +209,7 @@ enum { BITCODE_CURRENT_EPOCH = 0 };
> };
>
> enum MetadataCodes {
> - METADATA_STRING = 1, // MDSTRING: [values]
> + METADATA_STRING_OLD = 1, // MDSTRING: [values]
> METADATA_VALUE = 2, // VALUE: [type num, value num]
> METADATA_NODE = 3, // NODE: [n x md num]
> METADATA_NAME = 4, // STRING: [values]
> @@ -243,6 +243,8 @@ enum { BITCODE_CURRENT_EPOCH = 0 };
> METADATA_MODULE = 32, // [distinct, scope, name, ...]
> METADATA_MACRO = 33, // [distinct, macinfo, line, name, value]
> METADATA_MACRO_FILE = 34, // [distinct, macinfo, line, file, ...]
> + METADATA_BULK_STRING_SIZES = 35, // [m x (size num)]
> + METADATA_BULK_STRING_DATA = 36, // [blob]
> };
>
> // The constants block (CONSTANTS_BLOCK_ID) describes emission for each
>
> Modified: llvm/trunk/include/llvm/Support/StreamingMemoryObject.h
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Support/StreamingMemoryObject.h?rev=264409&r1=264408&r2=264409&view=diff
> ==============================================================================
> --- llvm/trunk/include/llvm/Support/StreamingMemoryObject.h (original)
> +++ llvm/trunk/include/llvm/Support/StreamingMemoryObject.h Fri Mar 25 09:40:18 2016
> @@ -28,15 +28,7 @@ public:
> uint64_t getExtent() const override;
> uint64_t readBytes(uint8_t *Buf, uint64_t Size,
> uint64_t Address) const override;
> - const uint8_t *getPointer(uint64_t address, uint64_t size) const override {
> - // FIXME: This could be fixed by ensuring the bytes are fetched and
> - // making a copy, requiring that the bitcode size be known, or
> - // otherwise ensuring that the memory doesn't go away/get reallocated,
> - // but it's not currently necessary. Users that need the pointer (any
> - // that need Blobs) don't stream.
> - report_fatal_error("getPointer in streaming memory objects not allowed");
> - return nullptr;
> - }
> + const uint8_t *getPointer(uint64_t Address, uint64_t Size) const override;
> bool isValidAddress(uint64_t address) const override;
>
> /// Drop s bytes from the front of the stream, pushing the positions of the
>
> Modified: llvm/trunk/lib/Bitcode/Reader/BitcodeReader.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Bitcode/Reader/BitcodeReader.cpp?rev=264409&r1=264408&r2=264409&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Bitcode/Reader/BitcodeReader.cpp (original)
> +++ llvm/trunk/lib/Bitcode/Reader/BitcodeReader.cpp Fri Mar 25 09:40:18 2016
> @@ -2363,7 +2363,7 @@ std::error_code BitcodeReader::parseMeta
> NextMetadataNo++);
> break;
> }
> - case bitc::METADATA_STRING: {
> + case bitc::METADATA_STRING_OLD: {
> std::string String(Record.begin(), Record.end());
>
> // Test for upgrading !llvm.loop.
> @@ -2373,6 +2373,38 @@ std::error_code BitcodeReader::parseMeta
> MetadataList.assignValue(MD, NextMetadataNo++);
> break;
> }
> + case bitc::METADATA_BULK_STRING_SIZES: {
> + // This is a pair of records for an MDString block: SIZES, which is a
> + // list of string lengths; and DATA, which is a blob with all the strings
> + // concatenated together.
> + //
> + // Note: since this record type was introduced after the upgrade for
> + // !llvm.loop, we don't need to change HasSeenOldLoopTags.
> + if (Record.empty())
> + return error("Invalid record: missing bulk metadata string sizes");
> +
> + StringRef Blob;
> + SmallVector<uint64_t, 1> BlobRecord;
> + Code = Stream.ReadCode();
> + unsigned BlobCode = Stream.readRecord(Code, BlobRecord, &Blob);
> + if (BlobCode != bitc::METADATA_BULK_STRING_DATA)
> + return error("Invalid record: missing bulk metadata string data");
> + if (!BlobRecord.empty())
> + return error("Invalid record: unexpected bulk metadata arguments");
> +
> + for (uint64_t Size : Record) {
> + if (Blob.size() < Size)
> + return error("Invalid record: not enough bulk metadata string bytes");
> +
> + // Extract the current string.
> + MetadataList.assignValue(MDString::get(Context, Blob.slice(0, Size)),
> + NextMetadataNo++);
> + Blob = Blob.drop_front(Size);
> + }
> + if (!Blob.empty())
> + return error("Invalid record: too many bulk metadata string bytes");
> + break;
> + }
> case bitc::METADATA_KIND: {
> // Support older bitcode files that had METADATA_KIND records in a
> // block with METADATA_BLOCK_ID.
>
> Modified: llvm/trunk/lib/Bitcode/Reader/BitstreamReader.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Bitcode/Reader/BitstreamReader.cpp?rev=264409&r1=264408&r2=264409&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Bitcode/Reader/BitstreamReader.cpp (original)
> +++ llvm/trunk/lib/Bitcode/Reader/BitstreamReader.cpp Fri Mar 25 09:40:18 2016
> @@ -261,6 +261,10 @@ unsigned BitstreamCursor::readRecord(uns
> }
>
> // Otherwise, inform the streamer that we need these bytes in memory.
> + // Skip over tail padding first. We can't do it later if this is a
> + // streaming memory object, since that could reallocate the storage that
> + // the blob pointer references.
> + JumpToBit(NewEnd);
> const char *Ptr = (const char*)
> BitStream->getBitcodeBytes().getPointer(CurBitPos/8, NumElts);
>
> @@ -272,8 +276,6 @@ unsigned BitstreamCursor::readRecord(uns
> for (; NumElts; --NumElts)
> Vals.push_back((unsigned char)*Ptr++);
> }
> - // Skip over tail padding.
> - JumpToBit(NewEnd);
> }
>
> return Code;
>
> Modified: llvm/trunk/lib/Bitcode/Writer/BitcodeWriter.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Bitcode/Writer/BitcodeWriter.cpp?rev=264409&r1=264408&r2=264409&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Bitcode/Writer/BitcodeWriter.cpp (original)
> +++ llvm/trunk/lib/Bitcode/Writer/BitcodeWriter.cpp Fri Mar 25 09:40:18 2016
> @@ -1347,31 +1347,78 @@ static void writeNamedMetadata(const Mod
> }
> }
>
> +static unsigned createMDStringDataAbbrev(BitstreamWriter &Stream) {
> + BitCodeAbbrev *Abbv = new BitCodeAbbrev();
> + Abbv->Add(BitCodeAbbrevOp(bitc::METADATA_BULK_STRING_DATA));
> + Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob));
> + return Stream.EmitAbbrev(Abbv);
> +}
> +
> +static void emitMDStringBlob(unsigned DataAbbrev,
> + ArrayRef<const Metadata *> Strings,
> + BitstreamWriter &Stream,
> + SmallVectorImpl<uint64_t> &Record,
> + SmallString<4096> &Blob) {
> + for (const Metadata *MD : Strings) {
> + StringRef S = cast<MDString>(MD)->getString();
> + Record.push_back(S.size());
> + Blob.append(S.begin(), S.end());
> + }
> +
> + Stream.EmitRecord(bitc::METADATA_BULK_STRING_SIZES, Record);
> + Record.clear();
> +
> + Record.push_back(bitc::METADATA_BULK_STRING_DATA);
> + Stream.EmitRecordWithBlob(DataAbbrev, Record, Blob);
> + Record.clear();
> +}
> +
> +/// Write out a section of records for MDString.
> +///
> +/// All the MDString elements in a metadata block are emitted in bulk. They're
> +/// grouped into blocks, and each block is emitted with pair of records:
> +///
> +/// - SIZES: a list of the sizes of the strings in the block.
> +/// - DATA: the blob itself.
> +static void writeMetadataStrings(ArrayRef<const Metadata *> Strings,
> + BitstreamWriter &Stream,
> + SmallVectorImpl<uint64_t> &Record) {
> + if (Strings.empty())
> + return;
> +
> + // Emit strings in large blocks to reduce record overhead. Somewhat
> + // arbitrarily, limit this to 512 strings per blob:
> + // - big enough to eliminate overhead;
> + // - small enough that the reader's SIZES record will stay within a page.
> + const size_t NumStringsPerBlob = 512;
> + Record.reserve(std::min(NumStringsPerBlob, Strings.size()));
> +
> + unsigned DataAbbrev = createMDStringDataAbbrev(Stream);
> + SmallString<4096> Blob;
> + while (Strings.size() > NumStringsPerBlob) {
> + emitMDStringBlob(DataAbbrev, Strings.slice(0, NumStringsPerBlob), Stream,
> + Record, Blob);
> + Strings = Strings.slice(NumStringsPerBlob);
> + }
> + if (!Strings.empty())
> + emitMDStringBlob(DataAbbrev, Strings, Stream, Record, Blob);
> +}
> +
> static void WriteModuleMetadata(const Module &M,
> const ValueEnumerator &VE,
> BitstreamWriter &Stream) {
> - const auto &MDs = VE.getMDs();
> - if (MDs.empty() && M.named_metadata_empty())
> + if (VE.getMDs().empty() && M.named_metadata_empty())
> return;
>
> Stream.EnterSubblock(bitc::METADATA_BLOCK_ID, 3);
>
> - unsigned MDSAbbrev = 0;
> - if (VE.hasMDString()) {
> - // Abbrev for METADATA_STRING.
> - BitCodeAbbrev *Abbv = new BitCodeAbbrev();
> - Abbv->Add(BitCodeAbbrevOp(bitc::METADATA_STRING));
> - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array));
> - Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 8));
> - MDSAbbrev = Stream.EmitAbbrev(Abbv);
> - }
> -
> // Initialize MDNode abbreviations.
> #define HANDLE_MDNODE_LEAF(CLASS) unsigned CLASS##Abbrev = 0;
> #include "llvm/IR/Metadata.def"
>
> SmallVector<uint64_t, 64> Record;
> - for (const Metadata *MD : MDs) {
> + writeMetadataStrings(VE.getMDStrings(), Stream, Record);
> + for (const Metadata *MD : VE.getNonMDStrings()) {
> if (const MDNode *N = dyn_cast<MDNode>(MD)) {
> assert(N->isResolved() && "Expected forward references to be resolved");
>
> @@ -1385,17 +1432,7 @@ static void WriteModuleMetadata(const Mo
> #include "llvm/IR/Metadata.def"
> }
> }
> - if (const auto *MDC = dyn_cast<ConstantAsMetadata>(MD)) {
> - WriteValueAsMetadata(MDC, VE, Stream, Record);
> - continue;
> - }
> - const MDString *MDS = cast<MDString>(MD);
> - // Code: [strchar x N]
> - Record.append(MDS->bytes_begin(), MDS->bytes_end());
> -
> - // Emit the finished record.
> - Stream.EmitRecord(bitc::METADATA_STRING, Record, MDSAbbrev);
> - Record.clear();
> + WriteValueAsMetadata(cast<ConstantAsMetadata>(MD), VE, Stream, Record);
> }
>
> writeNamedMetadata(M, VE, Stream, Record);
>
> Modified: llvm/trunk/lib/Bitcode/Writer/ValueEnumerator.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Bitcode/Writer/ValueEnumerator.cpp?rev=264409&r1=264408&r2=264409&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Bitcode/Writer/ValueEnumerator.cpp (original)
> +++ llvm/trunk/lib/Bitcode/Writer/ValueEnumerator.cpp Fri Mar 25 09:40:18 2016
> @@ -280,8 +280,7 @@ static bool isIntOrIntVectorValue(const
>
> ValueEnumerator::ValueEnumerator(const Module &M,
> bool ShouldPreserveUseListOrder)
> - : HasMDString(false),
> - ShouldPreserveUseListOrder(ShouldPreserveUseListOrder) {
> + : ShouldPreserveUseListOrder(ShouldPreserveUseListOrder) {
> if (ShouldPreserveUseListOrder)
> UseListOrders = predictUseListOrder(M);
>
> @@ -375,6 +374,9 @@ ValueEnumerator::ValueEnumerator(const M
>
> // Optimize constant ordering.
> OptimizeConstants(FirstConstant, Values.size());
> +
> + // Organize metadata ordering.
> + organizeMetadata();
> }
>
> unsigned ValueEnumerator::getInstructionID(const Instruction *Inst) const {
> @@ -530,8 +532,8 @@ void ValueEnumerator::EnumerateMetadata(
> EnumerateMDNodeOperands(N);
> else if (auto *C = dyn_cast<ConstantAsMetadata>(MD))
> EnumerateValue(C->getValue());
> -
> - HasMDString |= isa<MDString>(MD);
> + else
> + ++NumMDStrings;
>
> // Replace the dummy ID inserted above with the correct one. MetadataMap may
> // have changed by inserting operands, so we need a fresh lookup here.
> @@ -557,6 +559,19 @@ void ValueEnumerator::EnumerateFunctionL
> FunctionLocalMDs.push_back(Local);
> }
>
> +void ValueEnumerator::organizeMetadata() {
> + if (!NumMDStrings)
> + return;
> +
> + // Put the strings first.
> + std::stable_partition(MDs.begin(), MDs.end(),
> + [](const Metadata *MD) { return isa<MDString>(MD); });
> +
> + // Renumber.
> + for (unsigned I = 0, E = MDs.size(); I != E; ++I)
> + MetadataMap[MDs[I]] = I + 1;
> +}
> +
> void ValueEnumerator::EnumerateValue(const Value *V) {
> assert(!V->getType()->isVoidTy() && "Can't insert void values!");
> assert(!isa<MetadataAsValue>(V) && "EnumerateValue doesn't handle Metadata!");
>
> Modified: llvm/trunk/lib/Bitcode/Writer/ValueEnumerator.h
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Bitcode/Writer/ValueEnumerator.h?rev=264409&r1=264408&r2=264409&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Bitcode/Writer/ValueEnumerator.h (original)
> +++ llvm/trunk/lib/Bitcode/Writer/ValueEnumerator.h Fri Mar 25 09:40:18 2016
> @@ -66,7 +66,7 @@ private:
> SmallVector<const LocalAsMetadata *, 8> FunctionLocalMDs;
> typedef DenseMap<const Metadata *, unsigned> MetadataMapType;
> MetadataMapType MetadataMap;
> - bool HasMDString;
> + unsigned NumMDStrings = 0;
> bool ShouldPreserveUseListOrder;
>
> typedef DenseMap<AttributeSet, unsigned> AttributeGroupMapType;
> @@ -121,8 +121,6 @@ public:
> }
> unsigned numMDs() const { return MDs.size(); }
>
> - bool hasMDString() const { return HasMDString; }
> -
> bool shouldPreserveUseListOrder() const { return ShouldPreserveUseListOrder; }
>
> unsigned getTypeID(Type *T) const {
> @@ -157,9 +155,16 @@ public:
>
> const ValueList &getValues() const { return Values; }
> const std::vector<const Metadata *> &getMDs() const { return MDs; }
> + ArrayRef<const Metadata *> getMDStrings() const {
> + return makeArrayRef(MDs).slice(0, NumMDStrings);
> + }
> + ArrayRef<const Metadata *> getNonMDStrings() const {
> + return makeArrayRef(MDs).slice(NumMDStrings);
> + }
> const SmallVectorImpl<const LocalAsMetadata *> &getFunctionLocalMDs() const {
> return FunctionLocalMDs;
> }
> +
> const TypeList &getTypes() const { return Types; }
> const std::vector<const BasicBlock*> &getBasicBlocks() const {
> return BasicBlocks;
> @@ -189,6 +194,10 @@ public:
> private:
> void OptimizeConstants(unsigned CstStart, unsigned CstEnd);
>
> + // Reorder the reachable metadata. This is not just an optimization, but is
> + // mandatory for emitting MDString correctly.
> + void organizeMetadata();
> +
> void EnumerateMDNodeOperands(const MDNode *N);
> void EnumerateMetadata(const Metadata *MD);
> void EnumerateFunctionLocalMetadata(const LocalAsMetadata *Local);
>
> Modified: llvm/trunk/lib/Support/StreamingMemoryObject.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Support/StreamingMemoryObject.cpp?rev=264409&r1=264408&r2=264409&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Support/StreamingMemoryObject.cpp (original)
> +++ llvm/trunk/lib/Support/StreamingMemoryObject.cpp Fri Mar 25 09:40:18 2016
> @@ -104,6 +104,12 @@ uint64_t StreamingMemoryObject::readByte
> return Size;
> }
>
> +const uint8_t *StreamingMemoryObject::getPointer(uint64_t Address,
> + uint64_t Size) const {
> + fetchToPos(Address + Size - 1);
> + return &Bytes[Address + BytesSkipped];
> +}
> +
> bool StreamingMemoryObject::dropLeadingBytes(size_t s) {
> if (BytesRead < s) return true;
> BytesSkipped = s;
>
> Removed: llvm/trunk/test/Bitcode/Inputs/invalid-fixme-streaming-blob.bc
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Bitcode/Inputs/invalid-fixme-streaming-blob.bc?rev=264408&view=auto
> ==============================================================================
> Binary files llvm/trunk/test/Bitcode/Inputs/invalid-fixme-streaming-blob.bc (original) and llvm/trunk/test/Bitcode/Inputs/invalid-fixme-streaming-blob.bc (removed) differ
>
> Modified: llvm/trunk/test/Bitcode/invalid.test
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Bitcode/invalid.test?rev=264409&r1=264408&r2=264409&view=diff
> ==============================================================================
> --- llvm/trunk/test/Bitcode/invalid.test (original)
> +++ llvm/trunk/test/Bitcode/invalid.test Fri Mar 25 09:40:18 2016
> @@ -168,11 +168,6 @@ RUN: FileCheck --check-prefix=INVALID-
>
> INVALID-ARGUMENT-TYPE: Invalid function argument type
>
> -RUN: not llvm-dis -disable-output %p/Inputs/invalid-fixme-streaming-blob.bc 2>&1 | \
> -RUN: FileCheck --check-prefix=STREAMING-BLOB %s
> -
> -STREAMING-BLOB: getPointer in streaming memory objects not allowed
> -
> RUN: not llvm-dis -disable-output %p/Inputs/invalid-function-comdat-id.bc 2>&1 | \
> RUN: FileCheck --check-prefix=INVALID-FCOMDAT-ID %s
>
>
> Modified: llvm/trunk/tools/llvm-bcanalyzer/llvm-bcanalyzer.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/tools/llvm-bcanalyzer/llvm-bcanalyzer.cpp?rev=264409&r1=264408&r2=264409&view=diff
> ==============================================================================
> --- llvm/trunk/tools/llvm-bcanalyzer/llvm-bcanalyzer.cpp (original)
> +++ llvm/trunk/tools/llvm-bcanalyzer/llvm-bcanalyzer.cpp Fri Mar 25 09:40:18 2016
> @@ -312,7 +312,9 @@ static const char *GetCodeName(unsigned
> case bitc::METADATA_BLOCK_ID:
> switch(CodeID) {
> default:return nullptr;
> - STRINGIFY_CODE(METADATA, STRING)
> + STRINGIFY_CODE(METADATA, STRING_OLD)
> + STRINGIFY_CODE(METADATA, BULK_STRING_SIZES)
> + STRINGIFY_CODE(METADATA, BULK_STRING_DATA)
> STRINGIFY_CODE(METADATA, NAME)
> STRINGIFY_CODE(METADATA, KIND) // Older bitcode has it in a MODULE_BLOCK
> STRINGIFY_CODE(METADATA, NODE)
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
More information about the llvm-commits
mailing list