[llvm] r214135 - IR: Optimize size of use-list order shuffle vectors

Wed Aug 6 11:20:30 PDT 2014

> On 2014-Aug-05, at 17:47, Sean Silva <chisophugis at gmail.com> wrote:
> 
>> On Mon, Aug 4, 2014 at 4:36 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
>> 
>> The difference between these three versions is pretty noisy, but the
>> `std::vector<>` version looks slightly slower.  I think it can be left
>> as-is in the tree, but let me know if you think differently and/or want
>> data from a bigger bitcode file.
> 
> This difference is definitely in the noise. Seems like optimizing this code path is pointless. Just leave it with whichever one is simplest (std::vector?). 

Agreed.  r214979.

>> Calculating the use-lists takes extra memory -- but none of these data
>> structures has much effect on how much.
> 
> Do you have stats about the size distributions of use lists? E.g. a histogram? (might want to use a log scale)
> Also comparing said distributions across many different bitcode modules of different sorts of code (e.g. chromium, firefox, clang, various cases in the test-suite, etc.).

No, I didn't collect those, although it would be interesting to know.

Without having looked deeply, my impression is that most of the expense
is in *calculating* the use-list order -- i.e., in the
`DenseMap<Value *, unsigned>` that saves the relative order that values
will be read in the `BitcodeReader`.  This is calculated and used to
generate the use-list order shuffles, and has *every* `Value` accounted
for that will get read by the `BitcodeReader` -- even those that are
already "sorted" (including those with 0 or 1 uses).

Using this global map was supposed to be a first step, just for the
prototype -- I'd planned to reuse the similar `ValueEnumerator`
tables as much as possible, and predict use-list orders on-the-fly with
a rather more complicated scheme.  However, the memory overhead is small
enough that I'm no longer motivated to add the complexity, or to look
into other designs (like a sorted std::vector). 

I was anticipating needing to optimize the `UseListOrder` data structure
once I'd dealt with the global map.

Given that it doesn't seem to require optimization, I'm not particularly
motivated to collect these stats right now.  I could file a PR to look
into it later if you think it's worthwhile though -- that seems like a
useful output of `verify-uselistorder`, for example.

>> File-size of the input bitcode for same (in the same order):
>> 
>>     7.2M nopreserve.bc
>>     7.6M preserve.bc
>>     8.3M shuffled.bc
> 
> Why does shuffling increase the filesize? Is there some "default" order that is cheaper to store? How does the choice of permutation affect the amount stored?

The use-lists aren't stored directly -- some users can't be referenced,
so there isn't a practical way to do that.

Instead, we predict the order the use-lists will be after reading, and
store how to reorder them.  If the use list is already sorted in the
predicted order, there's no need to store an ordering at all.

Shuffling makes it so that most of the use-lists will *not* be in the
predicted order, causing most of the use-list-orders to be written out
(1/2 of those size 2, 5/6 of those size 3, etc.).

A given use-list is either stored or it isn't -- I didn't optimize
based on the "distance" from sorted.  I had planned some storage
optimizations based on the known-limited domain, but it turns out the
bitcode format stores record contents in VBR so I skipped them.