[llvm-bugs] [Bug 35499] New: Faster implementation of std::set_union

Fri Dec 1 14:04:34 PST 2017

https://bugs.llvm.org/show_bug.cgi?id=35499

            Bug ID: 35499
           Summary: Faster implementation of std::set_union
           Product: libc++
           Version: unspecified
          Hardware: PC
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: All Bugs
          Assignee: unassignedclangbugs at nondot.org
          Reporter: denis.yaroshevskij at gmail.com
                CC: llvm-bugs at lists.llvm.org, mclow.lists at gmail.com

Hi!

I need a version of std::set_union that would prefer the left range over the
right one.

And my implementations turn out to be significantly faster that std::set_union
as it currently implement for all cases (not only the ones that I was
optimizing for).

I have to versions:

The fastest one I have is v7:
https://github.com/DenisYaroshevskiy/srt-library/blob/master/set_unions.h#L158

But I do some manual loop unrolling.
1 - I'm not 100% sure that this would be awesome for all types and comparisons.
2 - It looks super biased to the left (still the fastest all over the place
though).

Other ones, that I'm confident will be better for all types are: v4 and v5.
https://github.com/DenisYaroshevskiy/srt-library/blob/master/set_unions.h#L77

They don't do unrolling - so they might be suitable for all types, but they
loose up to 30% to the unrolled one.

Benchmark:
2000 uniformly distributed 64 bit integers.
They are distributed in two vectors.

>From left to right - less integers in the left vector, more integers in the
right vector. (On the very left: - 2000/0, middle - 1000/1000, the very right -
0/2000).

Compiler: Apple LLVM version 9.0.0 (clang-900.0.38) (I don't know what is the
real clang version this is).

Compiler options: clang++ --std=c++14 -O3 -Werror -Wall

Machine: https://support.apple.com/kb/SP719?viewlocale=en_US (The 2.2GHZ
version)

Here are the results: https://plot.ly/~dyaroshev/45/

The benchmark's code:
https://github.com/DenisYaroshevskiy/srt-library/blob/master/other_benchmarks/set_unions_bench.cc

My question is:

Do you want it? Which version? We can do some introspection for unrolling, like
POD only, or smth.
What benchmarks would you need?

We can probably do better, if we do not concentrate on the left side as much,
but I don't have the code yet. Maybe (huge maybe) I'll get to it in a few
weeks, but we can still do better in the mean time, right?

Couple of reasons why the current version is two-three times slower:
- two boundary checks on every step. (you only increase one iterator)
- I do way less jumps (especially in the unrolled version)
- Increasing of the second iterator only happens is not duplicated (not that
sure about this one, but seems like it).

Best,
Denis

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20171201/6b22e27c/attachment.html>