[PATCH] D126029: [Bitcode] Add abbreviation for STRUCT_NAME when the name is not char6
Sam McCall via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu May 19 16:05:09 PDT 2022
sammccall created this revision.
sammccall added a reviewer: ilya-biryukov.
Herald added subscribers: usaxena95, kadircet, hiraditya.
Herald added a project: All.
sammccall requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.
When emitting bitcode for a C++ file, TYPE.STRUCT_NAME entries are a significant
part of the size. A typical name is "struct.std::_Vector_base.618", and
the record contents is the sequence of characters.
These records are efficiently encoded as arrays of 6-bit chars if each
char is representable in char6 encoding: [A-Za-z0-9._]
This does not include ":" so very few C++ names are so encoded - 0.4% in
the file I checked. (<> and space are also common and not encodable).
Before this patch, the fallback is to use unabbreviated encoding: each
character is a vbr6. For ~all characters (ascii>=0x20) this means
encoding as 12 bits per character.
After this patch, the fallback is to encode the characters as fixed8
arrays. This saves 4 bits per character (and also 6 bits per
unabbreviated record).
On my test file (bitcode from clang-tools-extra/clangd/ParsedAST.cpp):
overall size -18% (113 => 93kB)
STRUCT_NAME fraction 47% => 37%
STRUCT_NAME average size -33% (451 => 301)
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D126029
Files:
llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
Index: llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
===================================================================
--- llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
+++ llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
@@ -593,14 +593,17 @@
llvm_unreachable("Invalid ordering");
}
+// Abbrev6 is used if all characters fit in char6, else Abbrev8.
static void writeStringRecord(BitstreamWriter &Stream, unsigned Code,
- StringRef Str, unsigned AbbrevToUse) {
+ StringRef Str, unsigned Abbrev6 = 0,
+ unsigned Abbrev8 = 0) {
SmallVector<unsigned, 64> Vals;
+ unsigned AbbrevToUse = Abbrev6 ? Abbrev6 : Abbrev8;
// Code: [strchar x N]
for (char C : Str) {
if (AbbrevToUse && !BitCodeAbbrevOp::isChar6(C))
- AbbrevToUse = 0;
+ AbbrevToUse = Abbrev8;
Vals.push_back(C);
}
@@ -898,7 +901,12 @@
Abbv->Add(BitCodeAbbrevOp(bitc::TYPE_CODE_STRUCT_NAME));
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array));
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Char6));
- unsigned StructNameAbbrev = Stream.EmitAbbrev(std::move(Abbv));
+ unsigned StructNameAbbrev6 = Stream.EmitAbbrev(std::move(Abbv));
+ Abbv = std::make_shared<BitCodeAbbrev>();
+ Abbv->Add(BitCodeAbbrevOp(bitc::TYPE_CODE_STRUCT_NAME));
+ Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array));
+ Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 8));
+ unsigned StructNameAbbrev8 = Stream.EmitAbbrev(std::move(Abbv));
// Abbrev for TYPE_CODE_STRUCT_NAMED.
Abbv = std::make_shared<BitCodeAbbrev>();
@@ -996,7 +1004,7 @@
// Emit the name if it is present.
if (!ST->getName().empty())
writeStringRecord(Stream, bitc::TYPE_CODE_STRUCT_NAME, ST->getName(),
- StructNameAbbrev);
+ StructNameAbbrev6, StructNameAbbrev8);
}
break;
}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D126029.430840.patch
Type: text/x-patch
Size: 1917 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220519/b2cb04ed/attachment.bin>
More information about the llvm-commits
mailing list