[llvm-dev] RFC: Should SmallVectors be smaller?
Duncan P. N. Exon Smith via llvm-dev
llvm-dev at lists.llvm.org
Thu Jun 21 21:16:19 PDT 2018
>> On Jun 21, 2018, at 18:38, Chris Lattner <clattner at nondot.org> wrote:
>>
>>
>>
>> On Jun 21, 2018, at 9:52 AM, Duncan P. N. Exon Smith via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>
>> I've been curious for a while whether SmallVectors have the right speed/memory tradeoff. It would be straightforward to shave off a couple of pointers (1 pointer/4B on 32-bit; 2 pointers/16B on 64-bit) if users could afford to test for small-mode vs. large-mode.
>
> Something like this could definitely work, but most smallvectors are on the stack. They are intentionally used when sizeof(smallvector) isn’t important, so I don’t think this optimization will pay off.
For better or worse (mostly worse), there are a ton of SmallVector fields in data structures, including some even nested inside other SmallVectors (e.g., see the cleanup in r235229). Often these data structures are heap-allocated.
> Out of curiosity, what brings this up?
I've noticed that Clang is using more stack recently (we're seeing more crashes from template recursion; it seems the template recursion limit needs to shrink), and somehow that train of thought led to this.
I share your skepticism that it will help stack usage much, but SmallVector/SmallVectorImpl is so ubiquitous, it could help the heap a bit. And if it doesn’t hurt runtime performance in practice, there’s no reason to fork the data structure.
If no one has measured before I might try it some time.
> -Chris
>
>
>>
>> The current scheme works out to something like this:
>> ```
>> template <class T, size_t SmallCapacity>
>> struct SmallVector {
>> T *BeginX, *EndX, *CapacityX;
>> T Small[SmallCapacity];
>>
>> bool isSmall() const { return BeginX == Small; }
>> T *begin() { return BeginX; }
>> T *end() { return EndX; }
>> size_t size() const { return EndX - BeginX; }
>> size_t capacity() const { return CapacityX - BeginX; }
>> };
>> ```
>>
>> In the past I used something more like:
>> ```
>> template <class T, size_t SmallCapacity>
>> struct SmallVector2 {
>> unsigned Size;
>> unsigned Capacity;
>> union {
>> T Small[SmallCapacity];
>> T *Large;
>> };
>>
>> bool isSmall() const { return Capacity == SmallCapacity; } // Or a bit shaved off of Capacity.
>> T *begin() { return isSmall() ? Small : Large; }
>> T *end() { return begin() + Size; }
>> size_t size() const { return Size; }
>> size_t capacity() const { return Capacity; }
>> };
>> ```
>>
>> I'm curious whether this scheme would be really be slower in practice (as a complete replacement for `SmallVector` in ADT). I wonder, has anyone profiled something like this before? If so, in what context? on what workloads?
>>
>> Duncan
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180621/9253450e/attachment.html>
More information about the llvm-dev
mailing list