<div>Hi all,</div><div><br></div><div>I've been looking into how to make llvm bitcode files smaller. There is one simple change that appears to shrink linked bitcode files by about 15%. See this spreadsheet for some rough data:</div>
<div><br></div><div><a href="https://docs.google.com/spreadsheet/ccc?key=0AjRrJHQc4_bddEtJdjdIek5fMDdIdFFIZldZXzdWa0E">https://docs.google.com/spreadsheet/ccc?key=0AjRrJHQc4_bddEtJdjdIek5fMDdIdFFIZldZXzdWa0E</a><br></div>
<div><br></div><div><br></div><div>The change is in how operand ids are encoded in bitcode files. Rather than use an "absolute number" given by the ValueEnumerator, which counts up within a function, we can have the id be relative to the current instruction.</div>
<div><br></div><div>I.e., Instead of having:</div><div><br></div><div><div>... = icmp eq i32 n-1, n-2</div><div>br i1 ..., label %bb1, label %bb2</div><div><br></div><div>you have</div><div><br></div><div>... = icmp eq i32 1, 2</div>
<div>br i1 1, label %bb1, label %bb2<br></div></div><div><br></div><div><br></div><div>Thus the ids remain relatively small and can be encoded in fewer bits. This counters how ValueEnumerator starts assigning ids within functions at some large N (where N is the number of module-level values?). This also makes it more likely to have a repeated sequences of bytes. Many instructions in a function may now have similar operands.</div>
<div><br></div><div>The format of the ".ll" files from llvm-dis is not changed, just the binary format (run llvm-bcanalyzer -dump to see the difference).</div><div><br></div><div>Caveats:</div><div><br></div><div>
- Forward references will create negative-valued ids (which end up being written out as large 32-bit integers, as far as I could tell). The common case for this is PHI nodes, but in larger tests fewer bits *overall* are used for INST_PHI.</div>
<div><br></div><div>- Doesn't help with constant operands. Their ids will now constantly change...</div><div><br></div><div>- To retain backward compatibility with old bitcode files, I ended up using up a new bitc value "bitc::FUNCTION_BLOCK_REL_ID" vs the existing "bitc::FUNCTION_BLOCK_ID".</div>
<div><br></div><div>Are there known problems with this scheme?</div><div>Are there other ideas that have been floated around for reducing the size of bitcode files?</div><div>In any case, the patch is attached if there is interest... If you want to try out the patch, you can toggle between the new and old encoding by using the flag "-enable-old-style-functions" vs no flags, with llvm-as.</div>
<div><br></div><div><br></div><div>- Jan</div><div><br></div>