[llvm] r243386 - Adding ADT SortedVector; client patch will follow.

Puyan Lotfi plotfi at apple.com
Wed Jul 29 12:16:59 PDT 2015


I have a follow up patch I have ready, but still working on unit tests. 
I’ll follow up soon.

PL

> On Jul 29, 2015, at 9:02 AM, Aaron Ballman <aaron at aaronballman.com> wrote:
> 
> On Tue, Jul 28, 2015 at 8:47 AM, Aaron Ballman <aaron at aaronballman.com> wrote:
>> On Tue, Jul 28, 2015 at 2:04 AM, Puyan Lotfi <puyan at puyan.org> wrote:
>>> Author: zer0
>>> Date: Tue Jul 28 01:04:00 2015
>>> New Revision: 243386
>>> 
>>> URL: http://llvm.org/viewvc/llvm-project?rev=243386&view=rev
>>> Log:
>>> Adding ADT SortedVector; client patch will follow.
>>> 
>>> Added:
>>>    llvm/trunk/include/llvm/ADT/SortedVector.h
>>> Modified:
>>>    llvm/trunk/docs/ProgrammersManual.rst
>> 
>> This ADT is missing unit tests. There are a fair number of concerns
>> with this code, I think that it should be reverted until they are
>> resolved, the code compiles, and there are unit tests.
> 
> After discussion in IRC, I have reverted this (and follow-up commit
> r243389) in r243527, pending addressing post-commit review comments.
> Sorry for any troubles!
> 
> ~Aaron
> 
> 
>> 
>>> 
>>> Modified: llvm/trunk/docs/ProgrammersManual.rst
>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/ProgrammersManual.rst?rev=243386&r1=243385&r2=243386&view=diff
>>> ==============================================================================
>>> --- llvm/trunk/docs/ProgrammersManual.rst (original)
>>> +++ llvm/trunk/docs/ProgrammersManual.rst Tue Jul 28 01:04:00 2015
>>> @@ -1314,7 +1314,8 @@ never use hash_set and unordered_set bec
>>> std::multiset is useful if you're not interested in elimination of duplicates,
>>> but has all the drawbacks of :ref:`std::set <dss_set>`.  A sorted vector
>>> (where you don't delete duplicate entries) or some other approach is almost
>>> -always better.
>>> +always better. LLVM actually offers SortedVector which does the job of a sorted
>>> +std::vector.
>>> 
>>> .. _ds_map:
>>> 
>>> 
>>> Added: llvm/trunk/include/llvm/ADT/SortedVector.h
>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/ADT/SortedVector.h?rev=243386&view=auto
>>> ==============================================================================
>>> --- llvm/trunk/include/llvm/ADT/SortedVector.h (added)
>>> +++ llvm/trunk/include/llvm/ADT/SortedVector.h Tue Jul 28 01:04:00 2015
>>> @@ -0,0 +1,133 @@
>>> +//===-- llvm/ADT/SortedVector.h ---------------------------------*- C++ -*-===//
>>> +//
>>> +//                     The LLVM Compiler Infrastructure
>>> +//
>>> +// This file is distributed under the University of Illinois Open Source
>>> +// License. See LICENSE.TXT for details.
>>> +//
>>> +//===----------------------------------------------------------------------===//
>>> +
>>> +#ifndef LLVM_ADT_SORTEDVECTOR_H
>>> +#define LLVM_ADT_SORTEDVECTOR_H
>>> +
>>> +#include <vector>
>>> +#include <cassert>
>>> +#include <functional>
>>> +#include "llvm/Support/raw_ostream.h"
>>> +
>>> +namespace llvm {
>>> +
>>> +/// \brief Lazily maintains a sorted and unique vector of elements of type T.
>> 
>> I agree with Justin that this is misleading, the ADT doesn't maintain
>> the sorting or the uniqueness.
>> 
>>> +template<typename T, typename CMP = std::less<T>>
>>> +class SortedVector {
>>> +public:
>>> +  typedef typename std::vector<T> VectorType;
>> 
>> Why is this typedef needed, especially as a public member?
>> 
>>> +  typedef typename VectorType::iterator iterator;
>>> +  typedef typename VectorType::const_iterator const_iterator;
>> 
>> What about the other vector-like typedefs such as value_type,
>> size_type, reference, etc?
>> 
>>> +
>>> +private:
>>> +  VectorType Vector;
>>> +  bool IsSorted = true;
>>> +
>>> +  void doCheck() const {
>>> +    assert(IsSorted && "Unsorted SortedVector access; call sortUnique prior.");
>>> +  }
>>> +
>>> +public:
>>> +  /// \brief Appends Entry to the sorted unique vector; sets the IsSorted flag
>>> +  /// to false if appending Entry puts Vector into an unsorted state.
>>> +  void insert(const T &Entry) {
>> 
>> This isn't doing an insert operation because there's no iterator
>> specifying where it should be inserted. I think this could have a
>> better name that doesn't have other semantic connotations, such as
>> "add".
>> 
>>> +    if (!Vector.size())
>>> +      Vector.push_back(Entry);
>>> +
>>> +    // Vector is sorted and Entry is a duplicate of the previous so skip.
>>> +    if (IsSorted && Entry == Vector.back())
>>> +      return;
>> 
>> I agree with Justin's comment that using == here is strange given that
>> we can specialize the std::less.
>> 
>>> +
>>> +    IsSorted &= (CMP()(Vector.back(), Entry));
>>> +    Vector.push_back(Entry);
>> 
>> So this only uniques if the element added is the same element as the
>> last element in the sorted vector, but if the vector is not sorted or
>> if the duplicate entry is somewhere other than the end, it isn't
>> unique? These semantics don't make sense to me, as you'll never really
>> know whether you need to sort and unique or not, forcing the user to
>> always call sortUnique().
>> 
>>> +  }
>>> +
>>> +  // \brief Sorts and uniques Vector.
>> 
>> Generally, the comments refer to Vector, but that isn't the name of
>> the datatype, but an internal implementation detail of the datatype.
>> 
>>> +  void sortUnique() {
>>> +    if (IsSorted)
>>> +      return;
>> 
>> This doesn't seem right, as it may be sorted but not unique, and the
>> early return wouldn't unique the elements.
>> 
>>> +
>>> +    std::sort(Vector.begin(), Vector.end());
>> 
>> std::sort uses operator< but we specialize on std::less. This seems incorrect.
>> 
>>> +    Vector.erase(std::unique(Vector.begin(), Vector.end()), Vector.end());
>> 
>> Same issue here with use of == as part of std::unique.
>> 
>>> +    IsSorted = true;
>>> +  }
>> 
>> It seems like an rvalue version of the above function would be very
>> useful for performance reasons.
>> 
>>> +
>>> +  /// \brief Tells if Entry is in Vector without relying on sorted-uniqueness.
>> 
>> Either the ADT requires sorted-uniqueness or it does not; this dual
>> state is confusing.
>> 
>>> +  bool has(T Entry) const {
>> 
>> Why value semantics instead of reference semantics?
>> 
>>> +    if (IsSorted)
>>> +      return std::binary_search(Vector.begin(), Vector.end(), Entry);
>>> +
>>> +    return std::find(Vector.begin(), Vector.end(), Entry) != Vector.end();
>> 
>> Same issues regarding operator< vs std::less as above for both
>> std::binary_search() and std::find().
>> 
>>> +  }
>>> +
>>> +  /// \brief Returns a reference to the entry with the specified index.
>>> +  const T &operator[](unsigned index) const {
>> 
>> This should be using size_type instead of unsigned.
>> 
>>> +    assert(index < size() && "SortedVector index is out of range!");
>>> +    doCheck();
>>> +    return Vector[index];
>>> +  }
>> 
>> Why is there not a version of this that returns a non-const value? I
>> originally thought it was because the elements needed to be immutable
>> for uniqueness and sorting guarantees, but there are non-const begin()
>> and end() functions.
>> 
>>> +
>>> +  /// \brief Return an iterator to the start of the vector.
>>> +  iterator begin() {
>>> +    doCheck();
>>> +    return Vector.begin();
>>> +  }
>>> +
>>> +  /// \brief Returns const iterator to the start of the vector.
>>> +  const_iterator begin() const {
>>> +    doCheck();
>>> +    return Vector.begin();
>>> +  }
>>> +
>>> +  /// \brief Returns iterator to the end of Vector.
>>> +  iterator end() {
>>> +    doCheck();
>>> +    return Vector.end();
>>> +  }
>>> +
>>> +  /// \brief Returns const iterator to the end of Vector. Assert if unsorted.
>> 
>> Why is this the only comment for an iterator function that says it
>> asserts if unsorted?
>> 
>>> +  const_iterator end() const {
>>> +    doCheck();
>>> +    return Vector.end();
>>> +  }
>>> +
>>> +  /// \brief Erases Vector at position. Asserts if Vector is unsorted.
>>> +  iterator erase(iterator position) {
>>> +    doCheck();
>>> +    return Vector.erase(position);
>>> +  }
>>> +
>>> +  /// \brief Erases Vector entirely.
>>> +  iterator erase() {
>> 
>> This is the wrong name for this function, it should be clear(), by
>> convention. Also, why would this return an iterator?
>> 
>>> +    IsSorted = true;
>>> +    return Vector.erase();
>> 
>> This code won't even compile.
>> 
>>> +  }
>>> +
>>> +  /// \brief Returns number of entries in Vector; asserts if it is unsorted.
>>> +  size_t size() const {
>> 
>> Should be using size_type, and is an inconsistent type with operator[]
>> currently.
>> 
>>> +    doCheck();
>>> +    return Vector.size();
>>> +  }
>>> +
>>> +  /// \brief Returns true if Vector is empty.
>>> +  bool empty() const {
>>> +    return Vector.empty();
>>> +  }
>>> +
>>> +  /// \brief Clears all the entries.
>>> +  void reset() {
>>> +    IsSorted = true;
>>> +    Vector.resize(0, 0);
>> 
>> resize(0, 0) is not correct, depending on T. Also, this function
>> doesn't just reset the entries (that's what erase() was perhaps
>> attempting to do), it resets them and resizes the storage. I'm not
>> particularly keen on the name reset() for this functionality.
>> 
>>> +  }
>>> +};
>>> +
>>> +} // End of namespace llvm
>>> +
>>> +#endif // LLVM_ADT_SORTEDVECTOR_H
>>> +
>> 
>> ~Aaron
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits





More information about the llvm-commits mailing list