[LLVMdev] [PATCH / PROPOSAL] bitcode encoding that is ~15% smaller for large bitcode files...

Jan Voung jvoung at chromium.org
Tue Sep 25 17:08:32 PDT 2012


Hi all,

I've been looking into how to make llvm bitcode files smaller.  There is
one simple change that appears to shrink linked bitcode files by about 15%.
 See this spreadsheet for some rough data:

https://docs.google.com/spreadsheet/ccc?key=0AjRrJHQc4_bddEtJdjdIek5fMDdIdFFIZldZXzdWa0E


The change is in how operand ids are encoded in bitcode files.  Rather than
use an "absolute number" given by the ValueEnumerator, which counts up
within a function, we can have the id be relative to the current
instruction.

I.e., Instead of having:

... = icmp eq i32 n-1, n-2
br i1 ..., label %bb1, label %bb2

you have

... = icmp eq i32 1, 2
br i1 1, label %bb1, label %bb2


Thus the ids remain relatively small and can be encoded in fewer bits.
 This counters how ValueEnumerator starts assigning ids within functions at
some large N (where N is the number of module-level values?).  This also
makes it more likely to have a repeated sequences of bytes.  Many
instructions in a function may now have similar operands.

The format of the ".ll" files from llvm-dis is not changed, just the binary
format (run llvm-bcanalyzer -dump to see the difference).

Caveats:

- Forward references will create negative-valued ids (which end up being
written out as large 32-bit integers, as far as I could tell).  The common
case for this is PHI nodes, but in larger tests fewer bits *overall* are
used for INST_PHI.

- Doesn't help with constant operands.  Their ids will now constantly
change...

- To retain backward compatibility with old bitcode files, I ended up using
up a new bitc value "bitc::FUNCTION_BLOCK_REL_ID" vs the existing
"bitc::FUNCTION_BLOCK_ID".

Are there known problems with this scheme?
Are there other ideas that have been floated around for reducing the size
of bitcode files?
In any case, the patch is attached if there is interest...  If you want to
try out the patch, you can toggle between the new and old encoding by using
the flag "-enable-old-style-functions" vs no flags, with llvm-as.


- Jan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120925/81159bd3/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: relative_ids.patch
Type: application/octet-stream
Size: 38652 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120925/81159bd3/attachment.obj>


More information about the llvm-dev mailing list