[PATCH] D38433: Introduce a specialized data structure to be used in a subsequent change

Thu Oct 12 22:29:04 PDT 2017

sanjoy planned changes to this revision.
sanjoy added a comment.

In https://reviews.llvm.org/D38433#893745, @chandlerc wrote:

> In https://reviews.llvm.org/D38433#893658, @sanjoy wrote:
>
> > (Haven't addressed the code comments yet since the design isn't settled)
> >
> > In https://reviews.llvm.org/D38433#893426, @chandlerc wrote:
> >
> > > Have you considered building a `ChunkedVector` instead of a `ChunkedList`? Specifically, there is a great trick where you use a single index with the low bits being an index into the chunk and the high bits being an index into a vector of pointers. It has many of the benefits you list and is a bit simpler I think. It also supports essentially the entire vector API if desired. Both bi-directional and even random access are reasonably efficient. Good locality, etc.
> >
> >
> > With a vector-of-buffers implementation, I'm a bit worried about the space overhead on the smaller cases.  For instance, this is the histogram of how this data structure is populated from a clang-bootstrap (also in https://reviews.llvm.org/D38434):
> >
> >        Count: 731310
> >          Min: 1
> >         Mean: 8.555150
> >   50th %tile: 4
> >   95th %tile: 25
> >   99th %tile: 53
> >          Max: 433
> >
> >
> > If I used a vector-of-buffers, I will either have to recompute the capacity and end (of the last buffer) on every insert (which will require an additional deref and some computation) or have to keep two words in the data structure over the three that smallvector keeps anyway.  This adds a lot of relative overhead on the median case (4 elements).  In fact, the current situation of two extra words also qualifies as a "lot" of relative overhead IMO, and I want to think of an SSO to improve the situation.
>
>
> Hold on, the objects here are just pointers? Then none of this really makes sense to me...
>
> Chunked data structures seem to make the most sense if moving the objects is really expensive and/or the objects are really large.
>
> For pointers, why not just a vector?

I wanted to use a ChunkedList to avoid "slack" in the allocated memory since a vector can waste up to 1/2 the memory it currently has allocated.  But I'll keep this complexity out of LLVM for now (as I've said above, I was already somewhat on the fence), we can pull this patch in later if needed.

https://reviews.llvm.org/D38433