[llvm-dev] RFC: Should SmallVectors be smaller?

Thu Jun 21 19:30:23 PDT 2018

To Dean,

I think Duncan’s approach prohibit any usage of Small after the capacity grow over SmallCapacity.
So when the capacity exceed SmallCapacity, one should:
1. Allocate memory on heap
2. Copy data from  Small to that chunk
3. Assign pointer of that chunk to Large

As long as you always access Large after growth, there would be no data lose.

Bekket
> Dean Michael Berris via llvm-dev <llvm-dev at lists.llvm.org> 於 2018年6月21日 下午10:01 寫道：
> 
> 
> 
>> On 22 Jun 2018, at 02:52, Duncan P. N. Exon Smith via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>> 
>> I've been curious for a while whether SmallVectors have the right speed/memory tradeoff.  It would be straightforward to shave off a couple of pointers (1 pointer/4B on 32-bit; 2 pointers/16B on 64-bit) if users could afford to test for small-mode vs. large-mode.
>> 
>> The current scheme works out to something like this:
>> ```
>> template <class T, size_t SmallCapacity>
>> struct SmallVector {
>> T *BeginX, *EndX, *CapacityX;
>> T Small[SmallCapacity];
>> 
>> bool isSmall() const { return BeginX == Small; }
>> T *begin() { return BeginX; }
>> T *end() { return EndX; }
>> size_t size() const { return EndX - BeginX; }
>> size_t capacity() const { return CapacityX - BeginX; }
>> };
>> ```
>> 
>> In the past I used something more like:
>> ```
>> template <class T, size_t SmallCapacity>
>> struct SmallVector2 {
>> unsigned Size;
>> unsigned Capacity;
>> union {
>>   T Small[SmallCapacity];
>>   T *Large;
>> };
>> 
>> bool isSmall() const { return Capacity == SmallCapacity; } // Or a bit shaved off of Capacity.
>> T *begin() { return isSmall() ? Small : Large; }
>> T *end() { return begin() + Size; }
>> size_t size() const { return Size; }
>> size_t capacity() const { return Capacity; }
>> };
>> ```
>> 
>> I'm curious whether this scheme would be really be slower in practice (as a complete replacement for `SmallVector` in ADT).  I wonder, has anyone profiled something like this before?  If so, in what context?  on what workloads?
>> 
> 
> Doesn’t this scheme have a problem with undefined behaviour, since you may be changing the active member of the union when capacity grows larger than SmallCapacity?
> 
> -- Dean
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev