[PATCH] D126029: [Bitcode] Add abbreviation for STRUCT_NAME when the name is not char6

Sam McCall via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu May 19 16:05:09 PDT 2022


sammccall created this revision.
sammccall added a reviewer: ilya-biryukov.
Herald added subscribers: usaxena95, kadircet, hiraditya.
Herald added a project: All.
sammccall requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

When emitting bitcode for a C++ file, TYPE.STRUCT_NAME entries are a significant
part of the size. A typical name is "struct.std::_Vector_base.618", and
the record contents is the sequence of characters.

These records are efficiently encoded as arrays of 6-bit chars if each
char is representable in char6 encoding: [A-Za-z0-9._]
This does not include ":" so very few C++ names are so encoded - 0.4% in
the file I checked. (<> and space are also common and not encodable).

Before this patch, the fallback is to use unabbreviated encoding: each
character is a vbr6. For ~all characters (ascii>=0x20) this means
encoding as 12 bits per character.

After this patch, the fallback is to encode the characters as fixed8
arrays. This saves 4 bits per character (and also 6 bits per
unabbreviated record).

On my test file (bitcode from clang-tools-extra/clangd/ParsedAST.cpp):

  overall size               -18% (113 => 93kB)
  STRUCT_NAME fraction             47% => 37%
  STRUCT_NAME average size   -33% (451 => 301)


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D126029

Files:
  llvm/lib/Bitcode/Writer/BitcodeWriter.cpp


Index: llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
===================================================================
--- llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
+++ llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
@@ -593,14 +593,17 @@
   llvm_unreachable("Invalid ordering");
 }
 
+// Abbrev6 is used if all characters fit in char6, else Abbrev8.
 static void writeStringRecord(BitstreamWriter &Stream, unsigned Code,
-                              StringRef Str, unsigned AbbrevToUse) {
+                              StringRef Str, unsigned Abbrev6 = 0,
+                              unsigned Abbrev8 = 0) {
   SmallVector<unsigned, 64> Vals;
 
+  unsigned AbbrevToUse = Abbrev6 ? Abbrev6 : Abbrev8;
   // Code: [strchar x N]
   for (char C : Str) {
     if (AbbrevToUse && !BitCodeAbbrevOp::isChar6(C))
-      AbbrevToUse = 0;
+      AbbrevToUse = Abbrev8;
     Vals.push_back(C);
   }
 
@@ -898,7 +901,12 @@
   Abbv->Add(BitCodeAbbrevOp(bitc::TYPE_CODE_STRUCT_NAME));
   Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array));
   Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Char6));
-  unsigned StructNameAbbrev = Stream.EmitAbbrev(std::move(Abbv));
+  unsigned StructNameAbbrev6 = Stream.EmitAbbrev(std::move(Abbv));
+  Abbv = std::make_shared<BitCodeAbbrev>();
+  Abbv->Add(BitCodeAbbrevOp(bitc::TYPE_CODE_STRUCT_NAME));
+  Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Array));
+  Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 8));
+  unsigned StructNameAbbrev8 = Stream.EmitAbbrev(std::move(Abbv));
 
   // Abbrev for TYPE_CODE_STRUCT_NAMED.
   Abbv = std::make_shared<BitCodeAbbrev>();
@@ -996,7 +1004,7 @@
         // Emit the name if it is present.
         if (!ST->getName().empty())
           writeStringRecord(Stream, bitc::TYPE_CODE_STRUCT_NAME, ST->getName(),
-                            StructNameAbbrev);
+                            StructNameAbbrev6, StructNameAbbrev8);
       }
       break;
     }


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D126029.430840.patch
Type: text/x-patch
Size: 1917 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220519/b2cb04ed/attachment.bin>


More information about the llvm-commits mailing list