[llvm] r214135 - IR: Optimize size of use-list order shuffle vectors

Wed Aug 6 13:38:43 PDT 2014

On Wed, Aug 6, 2014 at 11:20 AM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:

>
> > On 2014-Aug-05, at 17:47, Sean Silva <chisophugis at gmail.com> wrote:
> >
> >> On Mon, Aug 4, 2014 at 4:36 PM, Duncan P. N. Exon Smith <
> dexonsmith at apple.com> wrote:
> >>
> >> The difference between these three versions is pretty noisy, but the
> >> `std::vector<>` version looks slightly slower.  I think it can be left
> >> as-is in the tree, but let me know if you think differently and/or want
> >> data from a bigger bitcode file.
> >
> > This difference is definitely in the noise. Seems like optimizing this
> code path is pointless. Just leave it with whichever one is simplest
> (std::vector?).
>
> Agreed.  r214979.
>
> >> Calculating the use-lists takes extra memory -- but none of these data
> >> structures has much effect on how much.
> >
> > Do you have stats about the size distributions of use lists? E.g. a
> histogram? (might want to use a log scale)
> > Also comparing said distributions across many different bitcode modules
> of different sorts of code (e.g. chromium, firefox, clang, various cases in
> the test-suite, etc.).
>
> No, I didn't collect those, although it would be interesting to know.
>
> Without having looked deeply, my impression is that most of the expense
> is in *calculating* the use-list order

Let's wait for some data to back this up before doing anything.

> -- i.e., in the
> `DenseMap<Value *, unsigned>` that saves the relative order that values
> will be read in the `BitcodeReader`.  This is calculated and used to
> generate the use-list order shuffles, and has *every* `Value` accounted
> for that will get read by the `BitcodeReader` -- even those that are
> already "sorted" (including those with 0 or 1 uses).
>
> Using this global map was supposed to be a first step, just for the
> prototype -- I'd planned to reuse the similar `ValueEnumerator`
> tables as much as possible, and predict use-list orders on-the-fly with
> a rather more complicated scheme.  However, the memory overhead is small
> enough that I'm no longer motivated to add the complexity, or to look
> into other designs (like a sorted std::vector).
>
> I was anticipating needing to optimize the `UseListOrder` data structure
> once I'd dealt with the global map.
>
> Given that it doesn't seem to require optimization, I'm not particularly
> motivated to collect these stats right now.  I could file a PR to look
> into it later if you think it's worthwhile though -- that seems like a
> useful output of `verify-uselistorder`, for example.
>
> >> File-size of the input bitcode for same (in the same order):
> >>
> >>     7.2M nopreserve.bc
> >>     7.6M preserve.bc
> >>     8.3M shuffled.bc
> >
> > Why does shuffling increase the filesize? Is there some "default" order
> that is cheaper to store? How does the choice of permutation affect the
> amount stored?
>
> The use-lists aren't stored directly -- some users can't be referenced,
> so there isn't a practical way to do that.
>
> Instead, we predict the order the use-lists will be after reading, and
> store how to reorder them.

That seems really fragile. However, I doubt there's  a better way to do it.

>  If the use list is already sorted in the
> predicted order, there's no need to store an ordering at all.
>

That makes sense.

-- Sean Silva

>
> Shuffling makes it so that most of the use-lists will *not* be in the
> predicted order, causing most of the use-list-orders to be written out
> (1/2 of those size 2, 5/6 of those size 3, etc.).
>
> A given use-list is either stored or it isn't -- I didn't optimize
> based on the "distance" from sorted.  I had planned some storage
> optimizations based on the known-limited domain, but it turns out the
> bitcode format stores record contents in VBR so I skipped them.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140806/a7acbbdd/attachment.html>