[llvm] r214135 - IR: Optimize size of use-list order shuffle vectors
David Blaikie
dblaikie at gmail.com
Tue Jul 29 10:21:02 PDT 2014
On Tue, Jul 29, 2014 at 10:15 AM, Duncan P. N. Exon Smith
<dexonsmith at apple.com> wrote:
>
>> On 2014-Jul-29, at 09:23, David Blaikie <dblaikie at gmail.com> wrote:
>>
>> On Mon, Jul 28, 2014 at 3:41 PM, Duncan P. N. Exon Smith
>> <dexonsmith at apple.com> wrote:
>>> Author: dexonsmith
>>> Date: Mon Jul 28 17:41:50 2014
>>> New Revision: 214135
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=214135&view=rev
>>> Log:
>>> IR: Optimize size of use-list order shuffle vectors
>>>
>>> Since we're storing lots of these, save two-pointers per vector
>>
>> It saves two pointers but costs one integer, right? At the cost of
>> never having a capacity that differs from the size, which means more
>> reallocation overhead? And I see there is no reallocation.. so that
>> explains it then.
>
> Yup. These are calculated once and then processed later.
>
> Also, it saves 3 pointers but costs one integer (2 pointers overall).
>
> - sizeof(ShuffleVector) == 32
> - sizeof(SmallVector<unsigned, 6>) == 48
Yep yep - eventually I saw the union you were using. At the cost of a
conditional for every access (to decide which side of the union to
look at), which might not be great - but again, I have no intuition
about the relative tradeoff there.
>
>> +Chandler: This is where I want std::dynamic_array.
>>
>>> with a
>>> custom type rather than using the relatively heavy `SmallVector`.
>>>
>>> Part of PR5680.
>>>
>>> Modified:
>>> llvm/trunk/include/llvm/IR/UseListOrder.h
>>> llvm/trunk/lib/Bitcode/Writer/ValueEnumerator.cpp
>>>
>>> Modified: llvm/trunk/include/llvm/IR/UseListOrder.h
>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/IR/UseListOrder.h?rev=214135&r1=214134&r2=214135&view=diff
>>> ==============================================================================
>>> --- llvm/trunk/include/llvm/IR/UseListOrder.h (original)
>>> +++ llvm/trunk/include/llvm/IR/UseListOrder.h Mon Jul 28 17:41:50 2014
>>> @@ -25,11 +25,58 @@ class Module;
>>> class Function;
>>> class Value;
>>>
>>> +/// \brief Structure to hold a use-list shuffle vector.
>>> +///
>>> +/// Stores most use-lists locally, but large use-lists use an extra heap entry.
>>> +/// Costs two fewer pointers than the equivalent \a SmallVector.
>>> +class UseListShuffleVector {
>>> + unsigned Size;
>>> + union {
>>> + unsigned *Ptr;
>>> + unsigned Array[6];
>>> + } Storage;
>>> +
>>> + bool isSmall() const { return Size <= 6; }
>>> + unsigned *data() { return isSmall() ? Storage.Array : Storage.Ptr; }
>>> + const unsigned *data() const {
>>> + return isSmall() ? Storage.Array : Storage.Ptr;
>>> + }
>>> +
>>> +public:
>>> + UseListShuffleVector() : Size(0) {}
>>> + UseListShuffleVector(UseListShuffleVector &&X) {
>>> + std::memcpy(this, &X, sizeof(UseListShuffleVector));
>>> + X.Size = 0;
>>> + }
>>> + explicit UseListShuffleVector(size_t Size) : Size(Size) {
>>> + if (!isSmall())
>>> + Storage.Ptr = new unsigned[Size];
>>> + }
>>> + ~UseListShuffleVector() {
>>> + if (!isSmall())
>>> + delete Storage.Ptr;
>>> + }
>>> +
>>> + typedef unsigned *iterator;
>>> + typedef const unsigned *const_iterator;
>>> +
>>> + size_t size() const { return Size; }
>>> + iterator begin() { return data(); }
>>> + iterator end() { return begin() + size(); }
>>> + const_iterator begin() const { return data(); }
>>> + const_iterator end() const { return begin() + size(); }
>>> + unsigned &operator[](size_t I) { return data()[I]; }
>>> + unsigned operator[](size_t I) const { return data()[I]; }
>>> +};
>>> +
>>> /// \brief Structure to hold a use-list order.
>>> struct UseListOrder {
>>> - const Function *F;
>>> const Value *V;
>>> - SmallVector<unsigned, 8> Shuffle;
>>> + const Function *F;
>>> + UseListShuffleVector Shuffle;
>>> +
>>> + UseListOrder(const Value *V, const Function *F, size_t ShuffleSize)
>>> + : V(V), F(F), Shuffle(ShuffleSize) {}
>>> };
>>>
>>> typedef std::vector<UseListOrder> UseListOrderStack;
>>>
>>> Modified: llvm/trunk/lib/Bitcode/Writer/ValueEnumerator.cpp
>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Bitcode/Writer/ValueEnumerator.cpp?rev=214135&r1=214134&r2=214135&view=diff
>>> ==============================================================================
>>> --- llvm/trunk/lib/Bitcode/Writer/ValueEnumerator.cpp (original)
>>> +++ llvm/trunk/lib/Bitcode/Writer/ValueEnumerator.cpp Mon Jul 28 17:41:50 2014
>>> @@ -133,12 +133,11 @@ static void predictValueUseListOrderImpl
>>> return;
>>>
>>> // Store the shuffle.
>>> - UseListOrder O;
>>> - O.V = V;
>>> - O.F = F;
>>> - for (auto &I : List)
>>> - O.Shuffle.push_back(I.second);
>>> - Stack.push_back(O);
>>> + UseListOrder O(V, F, List.size());
>>> + assert(List.size() == O.Shuffle.size() && "Wrong size");
>>> + for (size_t I = 0, E = List.size(); I != E; ++I)
>>> + O.Shuffle[I] = List[I].second;
>>> + Stack.emplace_back(std::move(O));
>>
>> Generally I'd suggest skipping emplace_back where push_back will have
>> the same performance characteristics (since you don't need to call
>> anything other than the move ctor, which push_back can do just fine)
>>
>> But given the use of the small-size optimization, pushing back the
>> result isn't ideal either, it's probably better to allocate the
>> element in the Stack then initialize it there so you don't have to
>> copy the small portion over:
>>
>> // Is there an easy way to make a sequence one larger and return a
>> reference to that new element? It seems such a common operation...
>> Stack.resize(Stack.size() + 1);
>> UseListOrder &O = Stack.back();
>> ...
>>
>
> Fixed in r214184.
>
>> In addition/alternatively, I'm not sure how valuable the small-size
>> optimization is if all of them are going in another container anyway.
>> Usually the small optimization is most valuable when it avoids malloc
>> allocation entirely (by using stack space) but here each element in
>> Stack will have the small footprint in use - so wasted small buffers
>> will be costing heap space...
>
> Every use list shuffle will be at least size 2 (otherwise, there's
> nothing to shuffle). If mallocs had 0B memory overhead,
> `std::vector<unsigned>` of size 2 would cost 32B. Since mallocs
> actually do have overhead (usually at least 16B), the common case (2-6
> items) will take less memory.
Right - but for every element that's bigger than the small space, you
still pay the small in the Stack, so it does become a bit of a matter
of size statistics as to where the sweet spot is.
>
>> (Chandler, again - thoughts on this tradeoff? I'm perhaps too readily
>> open to avoiding premature optimization and thus avoid small-size
>> optimizations until there's a really good reason to add them, but I
>> haven't done as much profiling/performance work to have an intuition
>> about when they're not premature and just good practice)
>>
>>
>>> }
>>>
>>> static void predictValueUseListOrder(const Value *V, const Function *F,
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
More information about the llvm-commits
mailing list