[RFC] SortedVector ADT

Puyan Lotfi plotfi at gmail.com
Fri Jul 31 09:26:27 PDT 2015


On Friday, July 31, 2015, Aaron Ballman <aaron at aaronballman.com> wrote:

> On Wed, Jul 29, 2015 at 8:40 PM, Puyan Lotfi <plotfi at apple.com
> <javascript:;>> wrote:
> > Justin, Aaron:
> >
> > These are the revised patches.
> > I have tried to address all of your feedback with the exception of the
> naming of the ADT.
> > I was thinking, if I did rename it I could go with SortedVectorSet or
> SortedSet.
> >
> > Anyways, I have added:
> >
> > - Better documentation changes that try to be more clear on how and when
> to use SortedVector.
> > - Better comments.
> > - Removed erase().
> > - Replaced reset with clear.
> > - Renamed insert to push_back
> > - Added unit testing.
> >
> > Hope this is enough to put my patch back in; please let me know if I
> missed anything.
>
> Comments below. Btw, I think this discussion should be on its own
> thread instead of attached to a commit revert message. I suspect there
> are others in the community who have opinions on this ADT, but are
> unaware of the discussion due to context. I've taken the liberty of
> starting a new email thread for this, and included a few usual
> suspects who review ADTs.
>
> For anyone joining in the middle, this is a discussion of an ADT Puyan
> proposed in:
>
> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150720/288930.html
>
> That was commit (and eventually reverted) in:
>
> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150727/289945.html
>
> > diff --git a/include/llvm/ADT/SortedVector.h
> b/include/llvm/ADT/SortedVector.h
> > new file mode 100644
> > index 0000000..ddafd6b
> > --- /dev/null
> > +++ b/include/llvm/ADT/SortedVector.h
> > @@ -0,0 +1,137 @@
> > +//===-- llvm/ADT/SortedVector.h ---------------------------------*- C++
> -*-===//
> > +//
> > +//                     The LLVM Compiler Infrastructure
> > +//
> > +// This file is distributed under the University of Illinois Open Source
> > +// License. See LICENSE.TXT for details.
> > +//
> >
> +//===----------------------------------------------------------------------===//
> > +
> > +#ifndef LLVM_ADT_SORTEDVECTOR_H
> > +#define LLVM_ADT_SORTEDVECTOR_H
> > +
> > +#include <vector>
> > +#include <cassert>
> > +#include <functional>
> > +
> > +namespace llvm {
> > +
> > +/// \brief A vector with minimal append overhead that enforces a sorted
> > +/// ordering of unique elements prior to access. The most common use
> case
> > +/// is to push_back multiple Entries and call sortUnique afterward.
>
> I am wondering why this should be an ADT instead of one-off
> functionality. I see that you're going to use it for
> MachineBasicBlock, but are there other use cases as well? The kind of
> functionality that this class provides is covered by basic STL
> mechanisms, and so I am hesitant to support it as an ADT for all of
> our users unless this is something we expect will be heavily used (or
> is heavily customized to support toolchain needs).
>
> Also, the comments do not specify what ordering it is sorted by (we
> can presume from the std::less, but it would be good to be explicit
> since this is user-facing documentation).
>
> > +template<typename T, typename CMP = std::less<T>>
> > +class SortedVector {
> > +private:
> > +  typedef T value_type;
> > +  typedef typename std::vector<value_type> vector_type;
> > +  typedef typename vector_type::const_iterator const_iterator;
> > +  typedef typename vector_type::iterator iterator;
> > +  typedef typename vector_type::size_type size_type;
>
> These typedefs should be public, aside from vector_type.
>
> Missing typedefs for difference_type, reference and const_reference.
>
> > +
> > +  vector_type Vector;
> > +  bool IsSorted = true;
> > +
> > +  void AssertIsSorted() const {
> > +    assert(IsSorted && "Unsorted SortedVector access; call sortUnique
> prior.");
> > +  }
> > +
> > +public:
> > +  /// \brief Appends Entry to the end of Vector; avoids appending
> duplicate
> > +  /// entries. If appending Entry causes Vector to be unsorted, then
> sortUnique
> > +  /// must be called prior to access.
>
> The public comments are still discussing private implementation
> details like Vector.
>
> How is the user of this function supposed to know when appending the
> entry causes the vector to be unsorted?
>
> > +  void push_back(const value_type &Entry) {
>
> Should be using the typedef for const_reference.
>
> > +    if (!Vector.size())
>
> Vector.empty() is more clear.
>
> > +      Vector.push_back(Entry);
>
> After pushing back in an empty vector, why are you relying on the
> std::equal_to to perform the return?
>
> > +
> > +    // If Entry is a duplicate of the previous Entry we skip.
> > +    if (std::equal_to<value_type>()(Vector.back(), Entry))
> > +      return;
>
> Why is this forcing use of std::equal_to, but we can specialize on
> something other than std::less? If someone is implementing a custom
> std::less implementation, it's very likely they would want a custom
> std::equal_to used as well.
>
> > +
> > +    IsSorted &= (CMP()(Vector.back(), Entry));
> > +    Vector.push_back(Entry);
> > +  }
> > +
> > +  // \brief Sorts and uniques Vector.
> > +  void sortUnique() {
> > +    if (IsSorted)
> > +      return;
>
> This still isn't correct. Consider code like:
>
> SortedVector<int> V;
> V.push_back(1);
> V.push_back(2);
> V.push_back(3);
> *V.begin() = 3;
>
> Not only is the vector no longer sorted (but it will claim that it is
> because IsSorted will be true), it's also not unique. We then fail to
> sort or unique the backing store.
>
> In order for this ADT to be viable, the underlying vector elements
> must be const so that you can never mutate their values through the
> ADT interface.
>
> > +
> > +    std::sort(Vector.begin(), Vector.end());
> > +    Vector.erase(std::unique(Vector.begin(), Vector.end()),
> Vector.end());
>
> Neither of these STL functions are using the custom comparator.
> Instead they use operator< on the object type, which is inconsistent.
> Should pass in CMP() to std::sort(), and should pass in the template
> parameter for std::equal_to for std::unique().
>
> > +    IsSorted = true;
> > +  }
> > +
> > +  /// \brief Tells if Entry is in Vector without relying on
> sorted-uniqueness.
> > +  bool has(value_type Entry) const {
>
> This should be using const_reference instead of forcing a by-value copy.
>
> > +    if (IsSorted)
> > +      return std::binary_search(Vector.begin(), Vector.end(), Entry);
>
> This function is using operator< to perform the comparisons instead of
> using CMP().
>
> > +
> > +    return std::find(Vector.begin(), Vector.end(), Entry) !=
> Vector.end();
>
> This function is using operator== to perform the comparisons instead
> of using a template parameter for std::equal_to.
>
> > +  }
>
> I find it really strange that so many member functions call
> AssertIsSorted(), but the only truly interesting part of the interface
> does not. Why is it valid for you to call has() without using a sorted
> vector? This means the complexity of calling this function is either
> O(N) or O(log N), and you have no way of knowing which you'll get
> unless you call sortUnique() first.
>
> > +
> > +  /// \brief Returns a reference to the Entry with the specified index.
> > +  const value_type &operator[](unsigned index) const {
>
> This should be returning a const_reference and taking a size_type.
>
> > +    assert(index < size() && "SortedVector index is out of range!");
> > +    AssertIsSorted();
> > +    return Vector[index];
> > +  }
> > +
> > +  /// \brief returns an iterator to the start of vector. Asserts if
> vector is
> > +  /// not sorted and uniqued.
> > +  iterator begin() {
> > +    AssertIsSorted();
> > +    return Vector.begin();
> > +  }
> > +
> > +  /// \brief Returns a const_iterator to the start of Vector. Asserts if
> > +  /// Vector is not sorted and uniqued.
> > +  const_iterator begin() const {
> > +    AssertIsSorted();
> > +    return Vector.begin();
> > +  }
> > +
> > +  /// \brief returns an iterator to the end of vector. Asserts if
> vector is
> > +  /// not sorted and uniqued.
> > +  iterator end() {
> > +    AssertIsSorted();
> > +    return Vector.end();
> > +  }
> > +
> > +  /// \brief Returns a const_iterator to the end of Vector. Asserts if
> > +  /// Vector is not sorted and uniqued.
> > +  const_iterator end() const {
> > +    AssertIsSorted();
> > +    return Vector.end();
> > +  }
> > +
> > +  /// \brief Erases Vector at position. Asserts if Vector not sorted and
> > +  /// uniqued.
> > +  iterator erase(iterator position) {
> > +    AssertIsSorted();
>
> A vector iterator is a direct index into the vector; why is this
> assertion required for erasing an arbitrary element?
>
> > +    return Vector.erase(position);
> > +  }
> > +
> > +  /// \brief Returns the total number of entries in Vector. Asserts if
> Vector
> > +  /// not sorted and uniqued.
> > +  size_type size() const {
> > +    AssertIsSorted();
>
> Why is this assertion required for testing the size?
>
> > +    return Vector.size();
> > +  }
> > +
> > +  /// \brief Returns true if Vector is empty.
> > +  bool empty() const {
> > +    return Vector.empty();
> > +  }
> > +
> > +  /// \brief Clears all Entries in Vector.
> > +  void clear() {
> > +    IsSorted = true;
> > +    Vector.clear();
> > +  }
> > +};
>
> There is a lot of functionality missing from this interface that are
> concerns. For instance, there's no rvalue references, no constructors,
> pop_back, rbegin()/rend(), etc. This seems like a very limited
> interface in some ways, while way too permissive of an interface in
> other ways.
>
> Even if you expanded the interface to cover more of the useful cases
> and fix the issues from the review, I am not certain this meets the
> bar to be an ADT. This, ultimately, is the stock STL vector that needs
> to remain ordered and unique. Why is an existing ordered, unique list
> (like std::set) insufficient?
>
>
I forgot to mention that this ADT also avoids having to sort the vector in
many of the common cases where entries are appended to Machine
BasicBlock::Lives by maintaining whether the vector is sorted at every
push_back. The vector is normally already sorted but not always. It keeps
compile time impact lower by doing this.


> ~Aaron
>
> > +
> > +} // End of namespace llvm
> > +
> > +#endif // LLVM_ADT_SORTEDVECTOR_H
> > +
> >
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu <javascript:;>
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150731/97edc856/attachment.html>


More information about the llvm-commits mailing list