[llvm] b8d08e9 - ADT: SmallVector size/capacity use word-size integers when elements are small

Duncan P. N. Exon Smith via llvm-commits llvm-commits at lists.llvm.org
Fri Apr 17 16:31:17 PDT 2020


Author: Andrew Browne
Date: 2020-04-17T16:11:13-07:00
New Revision: b8d08e961df1d229872c785ebdbc8367432e9752

URL: https://github.com/llvm/llvm-project/commit/b8d08e961df1d229872c785ebdbc8367432e9752
DIFF: https://github.com/llvm/llvm-project/commit/b8d08e961df1d229872c785ebdbc8367432e9752.diff

LOG: ADT: SmallVector size/capacity use word-size integers when elements are small

SmallVector currently uses 32bit integers for size and capacity to reduce
sizeof(SmallVector). This limits the number of elements to UINT32_MAX.

For a SmallVector<char>, this limits the SmallVector size to only 4GB.
Buffering bitcode output uses SmallVector<char>, but needs >4GB output.

This changes SmallVector size and capacity to conditionally use word-size
integers if the element type is small (<4 bytes). For larger elements types,
the vector size can reach ~16GB with 32bit size.

Making this conditional on the element type provides both the smaller
sizeof(SmallVector) for larger types which are unlikely to grow so large,
and supports larger capacities for smaller element types.

This change also includes a fix for the bug where a SmallVector with 32bit
size has reached UINT32_MAX elements, and cannot provide guaranteed growth.

Context:

    // Double the size of the allocated memory, guaranteeing space for at
    // least one more element or MinSize if specified.
    void grow(size_t MinSize = 0) { this->grow_pod(MinSize, sizeof(T)); }

    void push_back(const T &Elt) {
      if (LLVM_UNLIKELY(this->size() >= this->capacity()))
        this->grow();
      memcpy(reinterpret_cast<void *>(this->end()), &Elt, sizeof(T));
      this->set_size(this->size() + 1);
    }

When grow is called in push_back() without a MinSize specified, this is
relying on the guarantee of space for at least one more element.

There is an edge case bug where the SmallVector is already at its maximum size
and push_back() calls grow() with default MinSize of zero. Grow is unable to
provide space for one more element, but push_back() assumes the additional
element it will be available. This can result in silent memory corruption, as
this->end() will be an invalid pointer and the program may continue executing.

An alternative to this fix would be to remove the default argument from
grow(), which would mean several changing grow() to grow(this->size()+1)
in several places.

No test case added because it would require allocating a large ammount.

Differential Revision: https://reviews.llvm.org/D77621

Added: 
    

Modified: 
    llvm/include/llvm/ADT/SmallVector.h
    llvm/lib/Support/SmallVector.cpp

Removed: 
    


################################################################################
diff  --git a/llvm/include/llvm/ADT/SmallVector.h b/llvm/include/llvm/ADT/SmallVector.h
index 28b514d530dc..0e5a56f085f0 100644
--- a/llvm/include/llvm/ADT/SmallVector.h
+++ b/llvm/include/llvm/ADT/SmallVector.h
@@ -16,10 +16,10 @@
 #include "llvm/ADT/iterator_range.h"
 #include "llvm/Support/AlignOf.h"
 #include "llvm/Support/Compiler.h"
+#include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/Support/MemAlloc.h"
 #include "llvm/Support/type_traits.h"
-#include "llvm/Support/ErrorHandling.h"
 #include <algorithm>
 #include <cassert>
 #include <cstddef>
@@ -27,6 +27,7 @@
 #include <cstring>
 #include <initializer_list>
 #include <iterator>
+#include <limits>
 #include <memory>
 #include <new>
 #include <type_traits>
@@ -34,11 +35,23 @@
 
 namespace llvm {
 
-/// This is all the non-templated stuff common to all SmallVectors.
-class SmallVectorBase {
+/// This is all the stuff common to all SmallVectors.
+///
+/// The template parameter specifies the type which should be used to hold the
+/// Size and Capacity of the SmallVector, so it can be adjusted.
+/// Using 32 bit size is desirable to shink the size of the SmallVector.
+/// Using 64 bit size is desirable for cases like SmallVector<char>, where a
+/// 32 bit size would limit the vector to ~4GB. SmallVectors are used for
+/// buffering bitcode output - which can exceed 4GB.
+template <class Size_T> class SmallVectorBase {
 protected:
   void *BeginX;
-  unsigned Size = 0, Capacity;
+  Size_T Size = 0, Capacity;
+
+  /// The maximum value of the Size_T used.
+  static constexpr size_t SizeTypeMax() {
+    return std::numeric_limits<Size_T>::max();
+  }
 
   SmallVectorBase() = delete;
   SmallVectorBase(void *FirstEl, size_t TotalCapacity)
@@ -46,7 +59,39 @@ class SmallVectorBase {
 
   /// This is an implementation of the grow() method which only works
   /// on POD-like data types and is out of line to reduce code duplication.
-  void grow_pod(void *FirstEl, size_t MinCapacity, size_t TSize);
+  /// This function will report a fatal error if it cannot increase capacity.
+  void grow_pod(void *FirstEl, size_t MinCapacity, size_t TSize) {
+    // Ensure we can fit the new capacity.
+    // This is only going to be applicable if the when the capacity is 32 bit.
+    if (MinCapacity > SizeTypeMax())
+      report_bad_alloc_error("SmallVector capacity overflow during allocation");
+
+    // Ensure we can meet the guarantee of space for at least one more element.
+    // The above check alone will not catch the case where grow is called with a
+    // default MinCapacity of 0, but the current capacity cannot be increased.
+    // This is only going to be applicable if the when the capacity is 32 bit.
+    if (capacity() == SizeTypeMax())
+      report_bad_alloc_error("SmallVector capacity unable to grow");
+
+    // In theory 2*capacity can overflow if the capacity is 64 bit, but the
+    // original capacity would never be large enough for this to be a problem.
+    size_t NewCapacity = 2 * capacity() + 1; // Always grow.
+    NewCapacity = std::min(std::max(NewCapacity, MinCapacity), SizeTypeMax());
+
+    void *NewElts;
+    if (BeginX == FirstEl) {
+      NewElts = safe_malloc(NewCapacity * TSize);
+
+      // Copy the elements over.  No need to run dtors on PODs.
+      memcpy(NewElts, this->BeginX, size() * TSize);
+    } else {
+      // If this wasn't grown from the inline copy, grow the allocated space.
+      NewElts = safe_realloc(this->BeginX, NewCapacity * TSize);
+    }
+
+    this->BeginX = NewElts;
+    this->Capacity = NewCapacity;
+  }
 
 public:
   size_t size() const { return Size; }
@@ -69,9 +114,13 @@ class SmallVectorBase {
   }
 };
 
+template <class T>
+using SmallVectorSizeType =
+    typename std::conditional<sizeof(T) < 4, uintptr_t, uint32_t>::type;
+
 /// Figure out the offset of the first element.
 template <class T, typename = void> struct SmallVectorAlignmentAndSize {
-  AlignedCharArrayUnion<SmallVectorBase> Base;
+  AlignedCharArrayUnion<SmallVectorBase<SmallVectorSizeType<T>>> Base;
   AlignedCharArrayUnion<T> FirstEl;
 };
 
@@ -79,7 +128,10 @@ template <class T, typename = void> struct SmallVectorAlignmentAndSize {
 /// the type T is a POD. The extra dummy template argument is used by ArrayRef
 /// to avoid unnecessarily requiring T to be complete.
 template <typename T, typename = void>
-class SmallVectorTemplateCommon : public SmallVectorBase {
+class SmallVectorTemplateCommon
+    : public SmallVectorBase<SmallVectorSizeType<T>> {
+  using Base = SmallVectorBase<SmallVectorSizeType<T>>;
+
   /// Find the address of the first element.  For this pointer math to be valid
   /// with small-size of 0 for T with lots of alignment, it's important that
   /// SmallVectorStorage is properly-aligned even for small-size of 0.
@@ -91,21 +143,20 @@ class SmallVectorTemplateCommon : public SmallVectorBase {
   // Space after 'FirstEl' is clobbered, do not add any instance vars after it.
 
 protected:
-  SmallVectorTemplateCommon(size_t Size)
-      : SmallVectorBase(getFirstEl(), Size) {}
+  SmallVectorTemplateCommon(size_t Size) : Base(getFirstEl(), Size) {}
 
   void grow_pod(size_t MinCapacity, size_t TSize) {
-    SmallVectorBase::grow_pod(getFirstEl(), MinCapacity, TSize);
+    Base::grow_pod(getFirstEl(), MinCapacity, TSize);
   }
 
   /// Return true if this is a smallvector which has not had dynamic
   /// memory allocated for it.
-  bool isSmall() const { return BeginX == getFirstEl(); }
+  bool isSmall() const { return this->BeginX == getFirstEl(); }
 
   /// Put this vector in a state of being small.
   void resetToSmall() {
-    BeginX = getFirstEl();
-    Size = Capacity = 0; // FIXME: Setting Capacity to 0 is suspect.
+    this->BeginX = getFirstEl();
+    this->Size = this->Capacity = 0; // FIXME: Setting Capacity to 0 is suspect.
   }
 
 public:
@@ -123,6 +174,10 @@ class SmallVectorTemplateCommon : public SmallVectorBase {
   using pointer = T *;
   using const_pointer = const T *;
 
+  using Base::capacity;
+  using Base::empty;
+  using Base::size;
+
   // forward iterator creation methods.
   iterator begin() { return (iterator)this->BeginX; }
   const_iterator begin() const { return (const_iterator)this->BeginX; }
@@ -136,7 +191,9 @@ class SmallVectorTemplateCommon : public SmallVectorBase {
   const_reverse_iterator rend() const { return const_reverse_iterator(begin());}
 
   size_type size_in_bytes() const { return size() * sizeof(T); }
-  size_type max_size() const { return size_type(-1) / sizeof(T); }
+  size_type max_size() const {
+    return std::min(this->SizeTypeMax(), size_type(-1) / sizeof(T));
+  }
 
   size_t capacity_in_bytes() const { return capacity() * sizeof(T); }
 
@@ -231,12 +288,21 @@ class SmallVectorTemplateBase : public SmallVectorTemplateCommon<T> {
 // Define this out-of-line to dissuade the C++ compiler from inlining it.
 template <typename T, bool TriviallyCopyable>
 void SmallVectorTemplateBase<T, TriviallyCopyable>::grow(size_t MinSize) {
-  if (MinSize > UINT32_MAX)
+  // Ensure we can fit the new capacity.
+  // This is only going to be applicable if the when the capacity is 32 bit.
+  if (MinSize > this->SizeTypeMax())
     report_bad_alloc_error("SmallVector capacity overflow during allocation");
 
+  // Ensure we can meet the guarantee of space for at least one more element.
+  // The above check alone will not catch the case where grow is called with a
+  // default MinCapacity of 0, but the current capacity cannot be increased.
+  // This is only going to be applicable if the when the capacity is 32 bit.
+  if (this->capacity() == this->SizeTypeMax())
+    report_bad_alloc_error("SmallVector capacity unable to grow");
+
   // Always grow, even from zero.
   size_t NewCapacity = size_t(NextPowerOf2(this->capacity() + 2));
-  NewCapacity = std::min(std::max(NewCapacity, MinSize), size_t(UINT32_MAX));
+  NewCapacity = std::min(std::max(NewCapacity, MinSize), this->SizeTypeMax());
   T *NewElts = static_cast<T*>(llvm::safe_malloc(NewCapacity*sizeof(T)));
 
   // Move the elements over.

diff  --git a/llvm/lib/Support/SmallVector.cpp b/llvm/lib/Support/SmallVector.cpp
index 36f0a81f6b00..0d1765ab3e5c 100644
--- a/llvm/lib/Support/SmallVector.cpp
+++ b/llvm/lib/Support/SmallVector.cpp
@@ -36,30 +36,6 @@ static_assert(sizeof(SmallVector<Struct32B, 0>) >= alignof(Struct32B),
 static_assert(sizeof(SmallVector<void *, 1>) ==
                   sizeof(unsigned) * 2 + sizeof(void *) * 2,
               "wasted space in SmallVector size 1");
-
-/// grow_pod - This is an implementation of the grow() method which only works
-/// on POD-like datatypes and is out of line to reduce code duplication.
-void SmallVectorBase::grow_pod(void *FirstEl, size_t MinCapacity,
-                               size_t TSize) {
-  // Ensure we can fit the new capacity in 32 bits.
-  if (MinCapacity > UINT32_MAX)
-    report_bad_alloc_error("SmallVector capacity overflow during allocation");
-
-  size_t NewCapacity = 2 * capacity() + 1; // Always grow.
-  NewCapacity =
-      std::min(std::max(NewCapacity, MinCapacity), size_t(UINT32_MAX));
-
-  void *NewElts;
-  if (BeginX == FirstEl) {
-    NewElts = safe_malloc(NewCapacity * TSize);
-
-    // Copy the elements over.  No need to run dtors on PODs.
-    memcpy(NewElts, this->BeginX, size() * TSize);
-  } else {
-    // If this wasn't grown from the inline copy, grow the allocated space.
-    NewElts = safe_realloc(this->BeginX, NewCapacity * TSize);
-  }
-
-  this->BeginX = NewElts;
-  this->Capacity = NewCapacity;
-}
+static_assert(sizeof(SmallVector<char, 0>) ==
+                  sizeof(void *) * 2 + sizeof(void *),
+              "1 byte elements have word-sized type for size and capacity");


        


More information about the llvm-commits mailing list