[PATCH] D52697: Bug 39129: Speeding up partition_point/lower_bound/upper_bound/ by using unsigned division by 2 when possible.

Mon Oct 1 09:09:12 PDT 2018

dyaroshev added a comment.

In https://reviews.llvm.org/D52697#1251063, @mclow.lists wrote:

> I don't think there's a hurry to merge this.
>  The performance wins are nice - on a micro scale.
>  Do you have an example in a real app where this makes a difference?
>
> Note: I'm not disparaging the work you've done.
>  Nor do I think this isn't worth pursuing.
>  I'm just looking at other approaches.

At my current job I don't run libc++ in production, so I can't give you bigger measurements.

This is the examples I know:
Chromium actively using flat containers which rely on standard binary search: https://cs.chromium.org/chromium/src/base/containers/flat_tree.h?q=flat_tr&sq=package:chromium&g=0&l=904
FB does it too: https://github.com/facebook/folly/blob/master/folly/sorted_vector_types.h#L448
There are some implementations of hash maps that rely on std::lower_bound to find the correct bucket (I just talked to people who do that, I don't have a github link)
There was a few talks about flat_hash_map - they use the lower_bound too: https://github.com/skarupke/flat_hash_map/blob/master/flat_hash_map.hpp#L1218

I, too some extend, don't know what your concern is. It's 30 lines of code that makes code much faster at least on some benchmarks and smaller (I looked at the assembly).
These are highly used and important functions and if not to push for every bit of performance in the standard library, where to?

On the Note:
No worries, I know that you mean well. You have to maintain this staff and libc++ is your responsibility not mine.
I am completely OK with you making the decisions and appreciate you taking the time to review this stuff.

https://reviews.llvm.org/D52697