[PATCH] D53994: Fixing lower bound regression in certain situations.

Mon Dec 10 11:21:21 PST 2018

dyaroshev added a comment.

In D53994#1325493 <https://reviews.llvm.org/D53994#1325493>, @sbenza wrote:

> In D53994#1319418 <https://reviews.llvm.org/D53994#1319418>, @dyaroshev wrote:
>
> > In D53994#1319307 <https://reviews.llvm.org/D53994#1319307>, @sbenza wrote:
> >
> > > re resize+iota, I don't see why not. That code is not part of the measurements.
> >
> >
> > Why do you think for lower bound it is beneficial to look for "not found elements"?
>
>
> lower_bound, just like std::set/map will stop the search when they find the element, but will generally continue all the way if it is not there.
>  The access pattern is a little different when it is all misses.

There is no check for found element in lower_bound. It always goes to N == 0 regardless (which is a good thing since we do just one predicate invocation per iteration of the loop).

>> It never checks for equality (it should just call a partition_point but it gets ugly().
>>  I can run these benchmarks but I'd suggest not to commit them if they don't show anything.
>> 
>> Do you have any other suggestions? Popular sizes, 32/64 bit integers? I usually just pick 1000 and smth big but I don't know, maybe there is a research.
> 
> No idea about recommended sizes. I would assume lower_bound is mostly bound by L1 <https://reviews.llvm.org/L1> cache and branch prediction.
>  It should have a behavior similar to std::set in that the higher "nodes" will be hotter than the leaves.
>  sort has interesting behavior because they change algorithms depending on size. Not sure if lower_bound does this. Maybe they switch to linear search for small sizes?
> 
>> PS:
>>  For strings I would suggest maybe to also use URL;s of some kind.
>>  Random string comparisons results in a first character, right? But for URLs you often hit a long common prefix.
> 
> For strings I just kept the logic that was there before my change.
>  I agree that strings with common prefixes are a known (common) bad case for lexicographical sorting. URLs, file names, YYYYMMDD dates, etc tend to have largish common prefixes.
>  imo, the point of the string benchmarks is to compare a cheap-to-compare-and-swap type vs an expensive-to-compare-and-swap type. Extra compares/swaps would have a larger effect in the string benchmark.

Ok. I don't know about writing benchmarks enough to have an opinion to be honest.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D53994/new/

https://reviews.llvm.org/D53994