[llvm-dev] RFC: Should SmallVectors be smaller?

Thu Jun 21 19:01:45 PDT 2018

> On 22 Jun 2018, at 02:52, Duncan P. N. Exon Smith via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> I've been curious for a while whether SmallVectors have the right speed/memory tradeoff.  It would be straightforward to shave off a couple of pointers (1 pointer/4B on 32-bit; 2 pointers/16B on 64-bit) if users could afford to test for small-mode vs. large-mode.
> 
> The current scheme works out to something like this:
> ```
> template <class T, size_t SmallCapacity>
> struct SmallVector {
>  T *BeginX, *EndX, *CapacityX;
>  T Small[SmallCapacity];
> 
>  bool isSmall() const { return BeginX == Small; }
>  T *begin() { return BeginX; }
>  T *end() { return EndX; }
>  size_t size() const { return EndX - BeginX; }
>  size_t capacity() const { return CapacityX - BeginX; }
> };
> ```
> 
> In the past I used something more like:
> ```
> template <class T, size_t SmallCapacity>
> struct SmallVector2 {
>  unsigned Size;
>  unsigned Capacity;
>  union {
>    T Small[SmallCapacity];
>    T *Large;
>  };
> 
>  bool isSmall() const { return Capacity == SmallCapacity; } // Or a bit shaved off of Capacity.
>  T *begin() { return isSmall() ? Small : Large; }
>  T *end() { return begin() + Size; }
>  size_t size() const { return Size; }
>  size_t capacity() const { return Capacity; }
> };
> ```
> 
> I'm curious whether this scheme would be really be slower in practice (as a complete replacement for `SmallVector` in ADT).  I wonder, has anyone profiled something like this before?  If so, in what context?  on what workloads?
> 

Doesn’t this scheme have a problem with undefined behaviour, since you may be changing the active member of the union when capacity grows larger than SmallCapacity?

-- Dean