[llvm] r243386 - Adding ADT SortedVector; client patch will follow.

Aaron Ballman aaron at aaronballman.com
Tue Jul 28 05:47:28 PDT 2015


On Tue, Jul 28, 2015 at 2:04 AM, Puyan Lotfi <puyan at puyan.org> wrote:
> Author: zer0
> Date: Tue Jul 28 01:04:00 2015
> New Revision: 243386
>
> URL: http://llvm.org/viewvc/llvm-project?rev=243386&view=rev
> Log:
> Adding ADT SortedVector; client patch will follow.
>
> Added:
>     llvm/trunk/include/llvm/ADT/SortedVector.h
> Modified:
>     llvm/trunk/docs/ProgrammersManual.rst

This ADT is missing unit tests. There are a fair number of concerns
with this code, I think that it should be reverted until they are
resolved, the code compiles, and there are unit tests.

>
> Modified: llvm/trunk/docs/ProgrammersManual.rst
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/ProgrammersManual.rst?rev=243386&r1=243385&r2=243386&view=diff
> ==============================================================================
> --- llvm/trunk/docs/ProgrammersManual.rst (original)
> +++ llvm/trunk/docs/ProgrammersManual.rst Tue Jul 28 01:04:00 2015
> @@ -1314,7 +1314,8 @@ never use hash_set and unordered_set bec
>  std::multiset is useful if you're not interested in elimination of duplicates,
>  but has all the drawbacks of :ref:`std::set <dss_set>`.  A sorted vector
>  (where you don't delete duplicate entries) or some other approach is almost
> -always better.
> +always better. LLVM actually offers SortedVector which does the job of a sorted
> +std::vector.
>
>  .. _ds_map:
>
>
> Added: llvm/trunk/include/llvm/ADT/SortedVector.h
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/ADT/SortedVector.h?rev=243386&view=auto
> ==============================================================================
> --- llvm/trunk/include/llvm/ADT/SortedVector.h (added)
> +++ llvm/trunk/include/llvm/ADT/SortedVector.h Tue Jul 28 01:04:00 2015
> @@ -0,0 +1,133 @@
> +//===-- llvm/ADT/SortedVector.h ---------------------------------*- C++ -*-===//
> +//
> +//                     The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open Source
> +// License. See LICENSE.TXT for details.
> +//
> +//===----------------------------------------------------------------------===//
> +
> +#ifndef LLVM_ADT_SORTEDVECTOR_H
> +#define LLVM_ADT_SORTEDVECTOR_H
> +
> +#include <vector>
> +#include <cassert>
> +#include <functional>
> +#include "llvm/Support/raw_ostream.h"
> +
> +namespace llvm {
> +
> +/// \brief Lazily maintains a sorted and unique vector of elements of type T.

I agree with Justin that this is misleading, the ADT doesn't maintain
the sorting or the uniqueness.

> +template<typename T, typename CMP = std::less<T>>
> +class SortedVector {
> +public:
> +  typedef typename std::vector<T> VectorType;

Why is this typedef needed, especially as a public member?

> +  typedef typename VectorType::iterator iterator;
> +  typedef typename VectorType::const_iterator const_iterator;

What about the other vector-like typedefs such as value_type,
size_type, reference, etc?

> +
> +private:
> +  VectorType Vector;
> +  bool IsSorted = true;
> +
> +  void doCheck() const {
> +    assert(IsSorted && "Unsorted SortedVector access; call sortUnique prior.");
> +  }
> +
> +public:
> +  /// \brief Appends Entry to the sorted unique vector; sets the IsSorted flag
> +  /// to false if appending Entry puts Vector into an unsorted state.
> +  void insert(const T &Entry) {

This isn't doing an insert operation because there's no iterator
specifying where it should be inserted. I think this could have a
better name that doesn't have other semantic connotations, such as
"add".

> +    if (!Vector.size())
> +      Vector.push_back(Entry);
> +
> +    // Vector is sorted and Entry is a duplicate of the previous so skip.
> +    if (IsSorted && Entry == Vector.back())
> +      return;

I agree with Justin's comment that using == here is strange given that
we can specialize the std::less.

> +
> +    IsSorted &= (CMP()(Vector.back(), Entry));
> +    Vector.push_back(Entry);

So this only uniques if the element added is the same element as the
last element in the sorted vector, but if the vector is not sorted or
if the duplicate entry is somewhere other than the end, it isn't
unique? These semantics don't make sense to me, as you'll never really
know whether you need to sort and unique or not, forcing the user to
always call sortUnique().

> +  }
> +
> +  // \brief Sorts and uniques Vector.

Generally, the comments refer to Vector, but that isn't the name of
the datatype, but an internal implementation detail of the datatype.

> +  void sortUnique() {
> +    if (IsSorted)
> +      return;

This doesn't seem right, as it may be sorted but not unique, and the
early return wouldn't unique the elements.

> +
> +    std::sort(Vector.begin(), Vector.end());

std::sort uses operator< but we specialize on std::less. This seems incorrect.

> +    Vector.erase(std::unique(Vector.begin(), Vector.end()), Vector.end());

Same issue here with use of == as part of std::unique.

> +    IsSorted = true;
> +  }

It seems like an rvalue version of the above function would be very
useful for performance reasons.

> +
> +  /// \brief Tells if Entry is in Vector without relying on sorted-uniqueness.

Either the ADT requires sorted-uniqueness or it does not; this dual
state is confusing.

> +  bool has(T Entry) const {

Why value semantics instead of reference semantics?

> +    if (IsSorted)
> +      return std::binary_search(Vector.begin(), Vector.end(), Entry);
> +
> +    return std::find(Vector.begin(), Vector.end(), Entry) != Vector.end();

Same issues regarding operator< vs std::less as above for both
std::binary_search() and std::find().

> +  }
> +
> +  /// \brief Returns a reference to the entry with the specified index.
> +  const T &operator[](unsigned index) const {

This should be using size_type instead of unsigned.

> +    assert(index < size() && "SortedVector index is out of range!");
> +    doCheck();
> +    return Vector[index];
> +  }

Why is there not a version of this that returns a non-const value? I
originally thought it was because the elements needed to be immutable
for uniqueness and sorting guarantees, but there are non-const begin()
and end() functions.

> +
> +  /// \brief Return an iterator to the start of the vector.
> +  iterator begin() {
> +    doCheck();
> +    return Vector.begin();
> +  }
> +
> +  /// \brief Returns const iterator to the start of the vector.
> +  const_iterator begin() const {
> +    doCheck();
> +    return Vector.begin();
> +  }
> +
> +  /// \brief Returns iterator to the end of Vector.
> +  iterator end() {
> +    doCheck();
> +    return Vector.end();
> +  }
> +
> +  /// \brief Returns const iterator to the end of Vector. Assert if unsorted.

Why is this the only comment for an iterator function that says it
asserts if unsorted?

> +  const_iterator end() const {
> +    doCheck();
> +    return Vector.end();
> +  }
> +
> +  /// \brief Erases Vector at position. Asserts if Vector is unsorted.
> +  iterator erase(iterator position) {
> +    doCheck();
> +    return Vector.erase(position);
> +  }
> +
> +  /// \brief Erases Vector entirely.
> +  iterator erase() {

This is the wrong name for this function, it should be clear(), by
convention. Also, why would this return an iterator?

> +    IsSorted = true;
> +    return Vector.erase();

This code won't even compile.

> +  }
> +
> +  /// \brief Returns number of entries in Vector; asserts if it is unsorted.
> +  size_t size() const {

Should be using size_type, and is an inconsistent type with operator[]
currently.

> +    doCheck();
> +    return Vector.size();
> +  }
> +
> +  /// \brief Returns true if Vector is empty.
> +  bool empty() const {
> +    return Vector.empty();
> +  }
> +
> +  /// \brief Clears all the entries.
> +  void reset() {
> +    IsSorted = true;
> +    Vector.resize(0, 0);

resize(0, 0) is not correct, depending on T. Also, this function
doesn't just reset the entries (that's what erase() was perhaps
attempting to do), it resets them and resizes the storage. I'm not
particularly keen on the name reset() for this functionality.

> +  }
> +};
> +
> +} // End of namespace llvm
> +
> +#endif // LLVM_ADT_SORTEDVECTOR_H
> +

~Aaron



More information about the llvm-commits mailing list