[libcxx-commits] [PATCH] D63063: Bug 42208: speeding up std::merge
Denis Yaroshevskiy via Phabricator via libcxx-commits
libcxx-commits at lists.llvm.org
Sun Jun 9 13:01:34 PDT 2019
dyaroshev created this revision.
Herald added subscribers: libcxx-commits, ldionne, dmgreen, mgrang.
Benchmarking results:
Before:
----------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------------------------------
MergeBench_MergeAlg_TestInt32/512 1848 ns 1846 ns 378311
MergeBench_MergeAlg_TestInt32/2048 27005 ns 26989 ns 25576
MergeBench_MergeAlg_TestInt32/2097152 33327173 ns 33314619 ns 21
MergeBench_MergeAlg_TestInt64/512 1702 ns 1701 ns 407486
MergeBench_MergeAlg_TestInt64/2048 26351 ns 26340 ns 26949
MergeBench_MergeAlg_TestInt64/2097152 34492582 ns 34484900 ns 20
MergeBench_MergeAlg_TestUint32/512 1755 ns 1755 ns 379830
MergeBench_MergeAlg_TestUint32/2048 25971 ns 25963 ns 26655
MergeBench_MergeAlg_TestUint32/2097152 37003864 ns 35490619 ns 21
MergeBench_MergeAlg_TestMediumString/512 234988 ns 234489 ns 2641
MergeBench_MergeAlg_TestMediumString/2048 1120958 ns 1062598 ns 615
MergeBench_MergeAlg_TestMediumString/2097152 2595634493 ns 2590478000 ns 1
After:
----------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------------------------------
MergeBench_MergeAlg_TestInt32/512 1396 ns 1395 ns 485652 (25% speedup)
MergeBench_MergeAlg_TestInt32/2048 15691 ns 15682 ns 43672 (42% speedup)
MergeBench_MergeAlg_TestInt32/2097152 30340879 ns 30329130 ns 23 (9% speedup)
MergeBench_MergeAlg_TestInt64/512 1567 ns 1566 ns 440998 (9% speedup)
MergeBench_MergeAlg_TestInt64/2048 25090 ns 25076 ns 27286 (5% speedup)
MergeBench_MergeAlg_TestInt64/2097152 32398209 ns 32394000 ns 22 (7% speedup)
MergeBench_MergeAlg_TestUint32/512 1366 ns 1366 ns 507957 (23% speedup)
MergeBench_MergeAlg_TestUint32/2048 15713 ns 15706 ns 43127 (40% speedup)
MergeBench_MergeAlg_TestUint32/2097152 30373730 ns 30366261 ns 23 (18% speedup)
MergeBench_MergeAlg_TestMediumString/512 213092 ns 212974 ns 3253 (10% speedup)
MergeBench_MergeAlg_TestMediumString/2048 879484 ns 879021 ns 752 (22% speedup)
MergeBench_MergeAlg_TestMediumString/2097152 2156054708 ns 2155483000 ns 1 (17% speedup)
There are two issues with current implementation of std::merge:
https://github.com/llvm-mirror/libcxx/blob/1f60111b597e5cb80a4513ec86f79b7e137f7793/include/algorithm#L4353
1. The algorithm does two checks for boundary on every iteration, even though we only move one of the iterators
2. If one of the checks for left boundary is unrolled we get better loop structures for both 1 and 2 ranges being bigger.
The speed up for the 1 range dominating on some measurements I did gets up to 1.7 times, while the 2 - about 1.4
If you want to play with algorithms/parameters - you can do that on quick-bench.
Watch out for code alignment issues!! - unfortunately including all three benchmarks
in the binary will result in incorrect result.
Link: http://quick-bench.com/kWbYdPDFnrovXWnuF6xw5wK27B8
Binary size increase (godbolt: https://godbolt.org/z/b1ZFTA):
For std::string the size grows from 394 assembly instructions to 465 instructions (18%).
For int - from 62 to 64 (3%).
Considering other places in libcxx that specialize algorithms for specific sizes seems acceptable.
(
for example sort: https://github.com/llvm-mirror/libcxx/blob/1f60111b597e5cb80a4513ec86f79b7e137f7793/include/algorithm#L3703
rotate: https://github.com/llvm-mirror/libcxx/blob/1f60111b597e5cb80a4513ec86f79b7e137f7793/include/algorithm#L2388
(if I understand the rotate algorithm correctly)
)
Potential followups:
std::stable_sort, std::inplace_merge are relying on merge - but reimplement it currently from scratch.
This could be a useful improvement.
std::set_union/std::set_difference have some similar problems to std::merge and could be improved in a similar maner.
Repository:
rCXX libc++
https://reviews.llvm.org/D63063
Files:
benchmarks/algorithms.merge.bench.cpp
include/algorithm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D63063.203750.patch
Type: text/x-patch
Size: 5215 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/libcxx-commits/attachments/20190609/f5a9097f/attachment.bin>
More information about the libcxx-commits
mailing list