[llvm-dev] RFC: Should SmallVectors be smaller?

Thu Jun 21 09:52:14 PDT 2018

I've been curious for a while whether SmallVectors have the right speed/memory tradeoff.  It would be straightforward to shave off a couple of pointers (1 pointer/4B on 32-bit; 2 pointers/16B on 64-bit) if users could afford to test for small-mode vs. large-mode.

The current scheme works out to something like this:
```
template <class T, size_t SmallCapacity>
struct SmallVector {
  T *BeginX, *EndX, *CapacityX;
  T Small[SmallCapacity];

  bool isSmall() const { return BeginX == Small; }
  T *begin() { return BeginX; }
  T *end() { return EndX; }
  size_t size() const { return EndX - BeginX; }
  size_t capacity() const { return CapacityX - BeginX; }
};
```

In the past I used something more like:
```
template <class T, size_t SmallCapacity>
struct SmallVector2 {
  unsigned Size;
  unsigned Capacity;
  union {
    T Small[SmallCapacity];
    T *Large;
  };

  bool isSmall() const { return Capacity == SmallCapacity; } // Or a bit shaved off of Capacity.
  T *begin() { return isSmall() ? Small : Large; }
  T *end() { return begin() + Size; }
  size_t size() const { return Size; }
  size_t capacity() const { return Capacity; }
};
```

I'm curious whether this scheme would be really be slower in practice (as a complete replacement for `SmallVector` in ADT).  I wonder, has anyone profiled something like this before?  If so, in what context?  on what workloads?

Duncan