[llvm] r214135 - IR: Optimize size of use-list order shuffle vectors

Tue Jul 29 10:15:45 PDT 2014

> On 2014-Jul-29, at 09:23, David Blaikie <dblaikie at gmail.com> wrote:
> 
> On Mon, Jul 28, 2014 at 3:41 PM, Duncan P. N. Exon Smith
> <dexonsmith at apple.com> wrote:
>> Author: dexonsmith
>> Date: Mon Jul 28 17:41:50 2014
>> New Revision: 214135
>> 
>> URL: http://llvm.org/viewvc/llvm-project?rev=214135&view=rev
>> Log:
>> IR: Optimize size of use-list order shuffle vectors
>> 
>> Since we're storing lots of these, save two-pointers per vector
> 
> It saves two pointers but costs one integer, right? At the cost of
> never having a capacity that differs from the size, which means more
> reallocation overhead? And I see there is no reallocation.. so that
> explains it then.

Yup.  These are calculated once and then processed later.

Also, it saves 3 pointers but costs one integer (2 pointers overall).

  - sizeof(ShuffleVector) == 32
  - sizeof(SmallVector<unsigned, 6>) == 48

> +Chandler: This is where I want std::dynamic_array.
> 
>> with a
>> custom type rather than using the relatively heavy `SmallVector`.
>> 
>> Part of PR5680.
>> 
>> Modified:
>>    llvm/trunk/include/llvm/IR/UseListOrder.h
>>    llvm/trunk/lib/Bitcode/Writer/ValueEnumerator.cpp
>> 
>> Modified: llvm/trunk/include/llvm/IR/UseListOrder.h
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/IR/UseListOrder.h?rev=214135&r1=214134&r2=214135&view=diff
>> ==============================================================================
>> --- llvm/trunk/include/llvm/IR/UseListOrder.h (original)
>> +++ llvm/trunk/include/llvm/IR/UseListOrder.h Mon Jul 28 17:41:50 2014
>> @@ -25,11 +25,58 @@ class Module;
>> class Function;
>> class Value;
>> 
>> +/// \brief Structure to hold a use-list shuffle vector.
>> +///
>> +/// Stores most use-lists locally, but large use-lists use an extra heap entry.
>> +/// Costs two fewer pointers than the equivalent \a SmallVector.
>> +class UseListShuffleVector {
>> +  unsigned Size;
>> +  union {
>> +    unsigned *Ptr;
>> +    unsigned Array[6];
>> +  } Storage;
>> +
>> +  bool isSmall() const { return Size <= 6; }
>> +  unsigned *data() { return isSmall() ? Storage.Array : Storage.Ptr; }
>> +  const unsigned *data() const {
>> +    return isSmall() ? Storage.Array : Storage.Ptr;
>> +  }
>> +
>> +public:
>> +  UseListShuffleVector() : Size(0) {}
>> +  UseListShuffleVector(UseListShuffleVector &&X) {
>> +    std::memcpy(this, &X, sizeof(UseListShuffleVector));
>> +    X.Size = 0;
>> +  }
>> +  explicit UseListShuffleVector(size_t Size) : Size(Size) {
>> +    if (!isSmall())
>> +      Storage.Ptr = new unsigned[Size];
>> +  }
>> +  ~UseListShuffleVector() {
>> +    if (!isSmall())
>> +      delete Storage.Ptr;
>> +  }
>> +
>> +  typedef unsigned *iterator;
>> +  typedef const unsigned *const_iterator;
>> +
>> +  size_t size() const { return Size; }
>> +  iterator begin() { return data(); }
>> +  iterator end() { return begin() + size(); }
>> +  const_iterator begin() const { return data(); }
>> +  const_iterator end() const { return begin() + size(); }
>> +  unsigned &operator[](size_t I) { return data()[I]; }
>> +  unsigned operator[](size_t I) const { return data()[I]; }
>> +};
>> +
>> /// \brief Structure to hold a use-list order.
>> struct UseListOrder {
>> -  const Function *F;
>>   const Value *V;
>> -  SmallVector<unsigned, 8> Shuffle;
>> +  const Function *F;
>> +  UseListShuffleVector Shuffle;
>> +
>> +  UseListOrder(const Value *V, const Function *F, size_t ShuffleSize)
>> +      : V(V), F(F), Shuffle(ShuffleSize) {}
>> };
>> 
>> typedef std::vector<UseListOrder> UseListOrderStack;
>> 
>> Modified: llvm/trunk/lib/Bitcode/Writer/ValueEnumerator.cpp
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Bitcode/Writer/ValueEnumerator.cpp?rev=214135&r1=214134&r2=214135&view=diff
>> ==============================================================================
>> --- llvm/trunk/lib/Bitcode/Writer/ValueEnumerator.cpp (original)
>> +++ llvm/trunk/lib/Bitcode/Writer/ValueEnumerator.cpp Mon Jul 28 17:41:50 2014
>> @@ -133,12 +133,11 @@ static void predictValueUseListOrderImpl
>>     return;
>> 
>>   // Store the shuffle.
>> -  UseListOrder O;
>> -  O.V = V;
>> -  O.F = F;
>> -  for (auto &I : List)
>> -    O.Shuffle.push_back(I.second);
>> -  Stack.push_back(O);
>> +  UseListOrder O(V, F, List.size());
>> +  assert(List.size() == O.Shuffle.size() && "Wrong size");
>> +  for (size_t I = 0, E = List.size(); I != E; ++I)
>> +    O.Shuffle[I] = List[I].second;
>> +  Stack.emplace_back(std::move(O));
> 
> Generally I'd suggest skipping emplace_back where push_back will have
> the same performance characteristics (since you don't need to call
> anything other than the move ctor, which push_back can do just fine)
> 
> But given the use of the small-size optimization, pushing back the
> result isn't ideal either, it's probably better to allocate the
> element in the Stack then initialize it there so you don't have to
> copy the small portion over:
> 
>  // Is there an easy way to make a sequence one larger and return a
> reference to that new element? It seems such a common operation...
>  Stack.resize(Stack.size() + 1);
>  UseListOrder &O = Stack.back();
>  ...
> 

Fixed in r214184.

> In addition/alternatively, I'm not sure how valuable the small-size
> optimization is if all of them are going in another container anyway.
> Usually the small optimization is most valuable when it avoids malloc
> allocation entirely (by using stack space) but here each element in
> Stack will have the small footprint in use - so wasted small buffers
> will be costing heap space...

Every use list shuffle will be at least size 2 (otherwise, there's
nothing to shuffle).  If mallocs had 0B memory overhead,
`std::vector<unsigned>` of size 2 would cost 32B.  Since mallocs
actually do have overhead (usually at least 16B), the common case (2-6
items) will take less memory.

> (Chandler, again - thoughts on this tradeoff? I'm perhaps too readily
> open to avoiding premature optimization and thus avoid small-size
> optimizations until there's a really good reason to add them, but I
> haven't done as much profiling/performance work to have an intuition
> about when they're not premature and just good practice)
> 
> 
>> }
>> 
>> static void predictValueUseListOrder(const Value *V, const Function *F,
>> 
>> 
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits