[llvm] r214135 - IR: Optimize size of use-list order shuffle vectors

Duncan P. N. Exon Smith dexonsmith at apple.com
Mon Aug 4 16:36:57 PDT 2014


> On 2014-Jul-29, at 10:36, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
> 
>> 
>> On 2014-Jul-29, at 10:19, Sean Silva <chisophugis at gmail.com> wrote:
>> 
>> First, I agree with Chandler about not worrying unless this is in the profile.
>> 
>> However, if this really does need to be optimized....
>> 
>> Crazy idea: would it be possible to store just a single int's worth of RNG seed for each use list?
>> 
>> A less crazy idea: a vector of indices is an extremely memory-inefficient way to store permutations. For example, there are 12! permutations of 12 elements, and 12! is less than 2^32. Similarly, there are 20! permutations of 20 elements and 20! < 2^64. Therefore your "small" case could theoretically be just a single `unsigned` from a storage perspective.
>> A slightly memory-suboptimal but simple and cpu-friendly way to store the permutations in an integer would be to bit-pack the indices, using just as many bits for the indices as necessary. For example, suppose you were just allowed a single uint64. You could use the following arrangement to store permutations of up to 15 elements:
>> Low 4 bits: number of elements (the "size")
>> Each 4 bits after that: an index. Since we use 4 bits to store it, size() is at most 15, thus each index fits in 4 bits. 4 * 15 = 60, so that is just enough room for up to 15 elements.
>> (there is actually room for quite a bit of out-of-band data; if size() < 15, then you have entire unused indices at the top and so you have 4*(15 - size()) bits available) 
>> 
> 
> This makes a lot of sense to me.  I'm still trying to shake out some test
> failures, but once I get to looking at memory overhead I think this is a good
> direction.

I took some time today to run `llvm-as` and `llvm-dis` on the LTO'ed IR
for tablegen.  The bitcode is about 7.2MB on-disk.  This isn't really
big, but at least it's not tiny.  For all of these, I have the
`global_ctors` patch applied.

First, I collected stats on three different data structures for
`UseListShuffleVector`.

 1. `times`: Currently committed "small vector", with a 6-element small
    array of `unsigned`.

 2. `times-packed`: Modified version of `times` that uses a 24-element
    small array of `unsigned char` (the big array is still `unsigned`).
    Same `sizeof()` as the 6-element array above, but more often small,
    and also slightly more complex.

 3. `times-stdvec`: `std::vector<unsigned>`.

Here is the average user time and resident memory of each of these
(average of 10 runs) for `llvm-as -preserve-bc-use-list-order`:

    1.6211 151357849 times/preserve-as-*.profile
    1.6278 151314841 times-packed/preserve-as-*.profile
    1.6347 151341465 times-stdvec/preserve-as-*.profile

The difference between these three versions is pretty noisy, but the
`std::vector<>` version looks slightly slower.  I think it can be left
as-is in the tree, but let me know if you think differently and/or want
data from a bigger bitcode file.

----

Note that the difference in memory overhead between these data
structures doesn't seem important.  I ran `opt` on a bitcode file that
had preserved a "shuffled" use-list order, running no passes but using
`-preserve-bc-use-list-order`.

    0.1189  14545715 shuffled.profile
    0.1191  14537523 shuffled-packed.profile
    0.1197  14701363 shuffled-stdvec.profile

For reference, here are two more versions.  `packed8` has an 8-element
array instead of 24-element, so `sizeof(UseListShuffleVector)` actually
drops.  `nopreserve` is the current data structure but without
`-preserve-bc-use-list-order`.

    0.1184  14544486 shuffled-packed8.profile
    0.0941  12574310 shuffled-nopreserve.profile
    
Calculating the use-lists takes extra memory -- but none of these data
structures has much effect on how much.

----

Since I was collecting data anyway, I have some stats on overhead.

`llvm-as`, but without `-preserve-bc-use-list-order`:

    1.4869 151392256 times/nopreserve-as-*.profile
    1.4979 151357440 times-packed/nopreserve-as-*.profile
    1.4945 151396761 times-stdvec/nopreserve-as-*.profile

For `llvm-dis`, here's reading a file generated by `llvm-as` without
`-preserve-bc-use-list-order`, then with it, and then generated by
`verify-uselistorder -save-temps` after shuffling use-lists.

    0.7472 119365222 times/nopreserve-dis-*.profile
    0.7958 121296896 times/preserve-dis-*.profile
    0.8792 123158118 times-shuffled/preserve-dis-*.profile

File-size of the input bitcode for same (in the same order):

    7.2M nopreserve.bc
    7.6M preserve.bc
    8.3M shuffled.bc




More information about the llvm-commits mailing list