[llvm] r243386 - Adding ADT SortedVector; client patch will follow.

Mehdi Amini mehdi.amini at apple.com
Tue Jul 28 13:59:14 PDT 2015


> On Jul 28, 2015, at 5:47 AM, Aaron Ballman <aaron at aaronballman.com> wrote:
> 
> On Tue, Jul 28, 2015 at 2:04 AM, Puyan Lotfi <puyan at puyan.org> wrote:
>> Author: zer0
>> Date: Tue Jul 28 01:04:00 2015
>> New Revision: 243386
>> 
>> URL: http://llvm.org/viewvc/llvm-project?rev=243386&view=rev
>> Log:
>> Adding ADT SortedVector; client patch will follow.
>> 
>> Added:
>>    llvm/trunk/include/llvm/ADT/SortedVector.h
>> Modified:
>>    llvm/trunk/docs/ProgrammersManual.rst
> 
> This ADT is missing unit tests. There are a fair number of concerns
> with this code, I think that it should be reverted until they are
> resolved, the code compiles, and there are unit tests.
> 
>> 
>> Modified: llvm/trunk/docs/ProgrammersManual.rst
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/ProgrammersManual.rst?rev=243386&r1=243385&r2=243386&view=diff
>> ==============================================================================
>> --- llvm/trunk/docs/ProgrammersManual.rst (original)
>> +++ llvm/trunk/docs/ProgrammersManual.rst Tue Jul 28 01:04:00 2015
>> @@ -1314,7 +1314,8 @@ never use hash_set and unordered_set bec
>> std::multiset is useful if you're not interested in elimination of duplicates,
>> but has all the drawbacks of :ref:`std::set <dss_set>`.  A sorted vector
>> (where you don't delete duplicate entries) or some other approach is almost
>> -always better.
>> +always better. LLVM actually offers SortedVector which does the job of a sorted
>> +std::vector.
>> 
>> .. _ds_map:
>> 
>> 
>> Added: llvm/trunk/include/llvm/ADT/SortedVector.h
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/ADT/SortedVector.h?rev=243386&view=auto
>> ==============================================================================
>> --- llvm/trunk/include/llvm/ADT/SortedVector.h (added)
>> +++ llvm/trunk/include/llvm/ADT/SortedVector.h Tue Jul 28 01:04:00 2015
>> @@ -0,0 +1,133 @@
>> +//===-- llvm/ADT/SortedVector.h ---------------------------------*- C++ -*-===//
>> +//
>> +//                     The LLVM Compiler Infrastructure
>> +//
>> +// This file is distributed under the University of Illinois Open Source
>> +// License. See LICENSE.TXT for details.
>> +//
>> +//===----------------------------------------------------------------------===//
>> +
>> +#ifndef LLVM_ADT_SORTEDVECTOR_H
>> +#define LLVM_ADT_SORTEDVECTOR_H
>> +
>> +#include <vector>
>> +#include <cassert>
>> +#include <functional>
>> +#include "llvm/Support/raw_ostream.h"
>> +
>> +namespace llvm {
>> +
>> +/// \brief Lazily maintains a sorted and unique vector of elements of type T.
> 
> I agree with Justin that this is misleading, the ADT doesn't maintain
> the sorting or the uniqueness.
> 
>> +template<typename T, typename CMP = std::less<T>>
>> +class SortedVector {
>> +public:
>> +  typedef typename std::vector<T> VectorType;
> 
> Why is this typedef needed, especially as a public member?
> 
>> +  typedef typename VectorType::iterator iterator;
>> +  typedef typename VectorType::const_iterator const_iterator;
> 
> What about the other vector-like typedefs such as value_type,
> size_type, reference, etc?
> 
>> +
>> +private:
>> +  VectorType Vector;
>> +  bool IsSorted = true;
>> +
>> +  void doCheck() const {
>> +    assert(IsSorted && "Unsorted SortedVector access; call sortUnique prior.");
>> +  }
>> +
>> +public:
>> +  /// \brief Appends Entry to the sorted unique vector; sets the IsSorted flag
>> +  /// to false if appending Entry puts Vector into an unsorted state.
>> +  void insert(const T &Entry) {
> 
> This isn't doing an insert operation because there's no iterator
> specifying where it should be inserted. I think this could have a
> better name that doesn't have other semantic connotations, such as
> "add".
> 
>> +    if (!Vector.size())
>> +      Vector.push_back(Entry);
>> +
>> +    // Vector is sorted and Entry is a duplicate of the previous so skip.
>> +    if (IsSorted && Entry == Vector.back())
>> +      return;
> 
> I agree with Justin's comment that using == here is strange given that
> we can specialize the std::less.
> 
>> +
>> +    IsSorted &= (CMP()(Vector.back(), Entry));
>> +    Vector.push_back(Entry);
> 
> So this only uniques if the element added is the same element as the
> last element in the sorted vector, but if the vector is not sorted or
> if the duplicate entry is somewhere other than the end, it isn't
> unique? These semantics don't make sense to me, as you'll never really
> know whether you need to sort and unique or not, forcing the user to
> always call sortUnique().
> 
>> +  }
>> +
>> +  // \brief Sorts and uniques Vector.
> 
> Generally, the comments refer to Vector, but that isn't the name of
> the datatype, but an internal implementation detail of the datatype.
> 
>> +  void sortUnique() {
>> +    if (IsSorted)
>> +      return;
> 
> This doesn't seem right, as it may be sorted but not unique, and the
> early return wouldn't unique the elements.


My understanding is that the only way to end up in this state is by modifying the elements in place using the non-const begin/end right?


— 
Mehdi




> 
>> +
>> +    std::sort(Vector.begin(), Vector.end());
> 
> std::sort uses operator< but we specialize on std::less. This seems incorrect.
> 
>> +    Vector.erase(std::unique(Vector.begin(), Vector.end()), Vector.end());
> 
> Same issue here with use of == as part of std::unique.
> 
>> +    IsSorted = true;
>> +  }
> 
> It seems like an rvalue version of the above function would be very
> useful for performance reasons.
> 
>> +
>> +  /// \brief Tells if Entry is in Vector without relying on sorted-uniqueness.
> 
> Either the ADT requires sorted-uniqueness or it does not; this dual
> state is confusing.
> 
>> +  bool has(T Entry) const {
> 
> Why value semantics instead of reference semantics?
> 
>> +    if (IsSorted)
>> +      return std::binary_search(Vector.begin(), Vector.end(), Entry);
>> +
>> +    return std::find(Vector.begin(), Vector.end(), Entry) != Vector.end();
> 
> Same issues regarding operator< vs std::less as above for both
> std::binary_search() and std::find().
> 
>> +  }
>> +
>> +  /// \brief Returns a reference to the entry with the specified index.
>> +  const T &operator[](unsigned index) const {
> 
> This should be using size_type instead of unsigned.
> 
>> +    assert(index < size() && "SortedVector index is out of range!");
>> +    doCheck();
>> +    return Vector[index];
>> +  }
> 
> Why is there not a version of this that returns a non-const value? I
> originally thought it was because the elements needed to be immutable
> for uniqueness and sorting guarantees, but there are non-const begin()
> and end() functions.
> 
>> +
>> +  /// \brief Return an iterator to the start of the vector.
>> +  iterator begin() {
>> +    doCheck();
>> +    return Vector.begin();
>> +  }
>> +
>> +  /// \brief Returns const iterator to the start of the vector.
>> +  const_iterator begin() const {
>> +    doCheck();
>> +    return Vector.begin();
>> +  }
>> +
>> +  /// \brief Returns iterator to the end of Vector.
>> +  iterator end() {
>> +    doCheck();
>> +    return Vector.end();
>> +  }
>> +
>> +  /// \brief Returns const iterator to the end of Vector. Assert if unsorted.
> 
> Why is this the only comment for an iterator function that says it
> asserts if unsorted?
> 
>> +  const_iterator end() const {
>> +    doCheck();
>> +    return Vector.end();
>> +  }
>> +
>> +  /// \brief Erases Vector at position. Asserts if Vector is unsorted.
>> +  iterator erase(iterator position) {
>> +    doCheck();
>> +    return Vector.erase(position);
>> +  }
>> +
>> +  /// \brief Erases Vector entirely.
>> +  iterator erase() {
> 
> This is the wrong name for this function, it should be clear(), by
> convention. Also, why would this return an iterator?
> 
>> +    IsSorted = true;
>> +    return Vector.erase();
> 
> This code won't even compile.
> 
>> +  }
>> +
>> +  /// \brief Returns number of entries in Vector; asserts if it is unsorted.
>> +  size_t size() const {
> 
> Should be using size_type, and is an inconsistent type with operator[]
> currently.
> 
>> +    doCheck();
>> +    return Vector.size();
>> +  }
>> +
>> +  /// \brief Returns true if Vector is empty.
>> +  bool empty() const {
>> +    return Vector.empty();
>> +  }
>> +
>> +  /// \brief Clears all the entries.
>> +  void reset() {
>> +    IsSorted = true;
>> +    Vector.resize(0, 0);
> 
> resize(0, 0) is not correct, depending on T. Also, this function
> doesn't just reset the entries (that's what erase() was perhaps
> attempting to do), it resets them and resizes the storage. I'm not
> particularly keen on the name reset() for this functionality.
> 
>> +  }
>> +};
>> +
>> +} // End of namespace llvm
>> +
>> +#endif // LLVM_ADT_SORTEDVECTOR_H
>> +
> 
> ~Aaron
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits





More information about the llvm-commits mailing list