[PATCH] D53994: Fixing lower bound regression in certain situations.

Denis Yaroshevskiy via Phabricator reviews at reviews.llvm.org
Tue Dec 4 15:49:01 PST 2018

dyaroshev added a comment.

In D53994#1319307 <https://reviews.llvm.org/D53994#1319307>, @sbenza wrote:

> re resize+iota, I don't see why not. That code is not part of the measurements.
> As for why `0...n`, for integers it doesn't really what the values are. only that there are N distinct values. The initial ordering of those values is changed depending on the benchmark.
>  Note that for strings we don't use consecutive values as the value itself matters for `operator<`. That is, the first mismatched char is the one that matters and "sequential" strings would not be good for that.
> For `lower_bound` you might want a different sequence, though. You might also want to check for values missing in the list. The way we do it for std::set is to have `0,2,...,N`. We then search for `2*i` for hits and `2*i+1` for misses.

I see your point about n values but I'm not entirely sold.
There is a famous bug in quick sort implementation, for example, - not partitioning the duplicates.
BTW just ran a small benchmark, check out how much duplicates affect the performance of std::sort http://quick-bench.com/mq8UiBPBT5vmmS7FG-RmY11VriQ

I also suspect that median of 5 type of algorithm and other similar "find value to partition" techniques could  depend on the distribution - but maybe not.

Anyways, this is minor - just was confused a bit.

Why do you think for lower bound it is beneficial to look for "not found elements"? It never checks for equality (it should just call a partition_point but it gets ugly().
I can run these benchmarks but I'd suggest not to commit them if they don't show anything.

Do you have any other suggestions? Popular sizes, 32/64 bit integers? I usually just pick 1000 and smth big but I don't know, maybe there is a research.

For strings I would suggest maybe to also use URL;s of some kind.
Random string comparisons results in a first character, right? But for URLs you often hit a long common prefix.



More information about the libcxx-commits mailing list