[PATCH] D54381: [llvm-exegesis] InstructionBenchmarkClustering::dbScan(): use llvm::SetVector<> instead of ILLEGAL std::unordered_set<>

Sun Nov 11 06:51:52 PST 2018

lebedev.ri added a comment.

In https://reviews.llvm.org/D54381#1294614, @bobsayshilol wrote:

> I could be missing something, but I don't understand why `ToProcess` needs to be a set-like container since we're erasing elements as we go (ie the erased elements won't be duplicate checked on next insertion). We skip any that have been previously processed in the inner loop too, which seems like it's doing the same work the set would be doing.

Here i only improve the choice of the data structure, and don't do any algorithm-changing decisions.

I think the usage of `Set` is correct.
We will, at most, fail to deduplicate one element (the one we just removed) out of `N` returned by `rangeQuery(Q)`.
But if we don't do any deduplication at all, then we will add all the `N` elements. And that sounds exponential.
Maybe it should be a custom `SetVector`, where the `Set` retains the knowledge about removed elements, don't know yet.

> The `isNoise()` check also looks odd, since if `CurrentCluster.Id` has id `kNoise` then it could push the same index into `CurrentCluster.PointIndices` an unspecified number of times depending on the order that the elements are inserted and removed from `ToProcess`, but if `CurrentCluster.Id` can't be `kNoise` then that's not relevant.

Interesting question, but irrelevant for the patch at hand.

> From the docs:
> 
>> SetVector is an adapter class that defaults to using std::vector
> 
> so calling erase on the first element isn't going to be terribly efficient either.

Well, i'd call -60% pretty efficient.
Further patches [do/will] improve upon that.

Repository:
  rL LLVM

https://reviews.llvm.org/D54381