[PATCH] D54381: [llvm-exegesis] InstructionBenchmarkClustering::dbScan(): use llvm::SetVector<> instead of ILLEGAL std::unordered_set<>

Sun Nov 11 08:17:54 PST 2018

kristina added a comment.

In https://reviews.llvm.org/D54381#1294614, @bobsayshilol wrote:

> I could be missing something, but I don't understand why `ToProcess` needs to be a set-like container since we're erasing elements as we go (ie the erased elements won't be duplicate checked on next insertion). We skip any that have been previously processed in the inner loop too, which seems like it's doing the same work the set would be doing.
>
> The `isNoise()` check also looks odd, since if `CurrentCluster.Id` has id `kNoise` then it could push the same index into `CurrentCluster.PointIndices` an unspecified number of times depending on the order that the elements are inserted and removed from `ToProcess`, but if `CurrentCluster.Id` can't be `kNoise` then that's not relevant.
>
> From the docs:
>
> > SetVector is an adapter class that defaults to using std::vector
>
> so calling erase on the first element isn't going to be terribly efficient either.

Depends on how you use the "vector", the density of the elements gives it a big edge, especially paired with a slab allocator, if you can predict the rough upper boundary on the amount of data you need to allocate, you can have an adapter class that simply emulates vector/queue semantics by using a window into said array (imagine it as a 0, 1000 inserts later you have 1000, now removing something from the front is as simple as advancing the start of the window by 1-1000). If you're not using set semantics (I don't think you are from the conversation yesterday), you can also gain huge advantage copying data across such data structures.

@lebedev.ri, `std::deque<>` gave you a huge performance increase because it's essentially a combination of vectors and a slab allocator that usually allocates bigger slabs every time you exceed the capacity (IIRC), with a fancy iterator to hide crossing slab edges. Due to using slab allocation it also provides stable references as long as you only operate on the front and end of the queue (think tailq with an allocator/fancy iterator on top).

Repository:
  rL LLVM

https://reviews.llvm.org/D54381