[cfe-dev] Potential optimization for std::inplace_merge
Eric Fiselier via cfe-dev
cfe-dev at lists.llvm.org
Sun Oct 30 10:27:10 PDT 2016
Thanks for all of this hard work. It looks like a promising optimization.
Would you be interested in submitting a patch?
/Eric
On Thu, Oct 27, 2016 at 1:12 PM, Morwenn Ed via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
> I have been working on a sorting library for a while, and it happens to
> contain reimplementations of many standard algorithms borrowed from libc++
> (mostly because I needed to add a few more features to standard algorithms,
> akin to those in range-v3). I was working on a top-down in-place mergesort
> implementation, and I believe that I have found a potential albeit small optimization
> for std::inplace_merge, specifically in the helper function
> half_inplace_merge. Here is the original function borrowed from libc++
> (coding style has been changed, don't pay attention to that):
>
>
> template<typename Compare, typename InputIterator1,
> typename InputIterator2, typename OutputIterator>
> auto half_inplace_merge(InputIterator1 first1, InputIterator1 last1,
> InputIterator2 first2, InputIterator2 last2,
> OutputIterator result, Compare comp)
> -> void
> {
> for (; first1 != last1 ; ++result) {
> if (first2 == last2) {
> std::move(first1, last1, result);
> return;
> }
>
> if (comp(*first2, *first1)) {
> *result = std::move(*first2);
> ++first2;
> } else {
> *result = std::move(*first1);
> ++first1;
> }
> }
> // first2 through last2 are already in the right spot.
> }
>
> I made a simple observation: at every loop iteration, we check whether
> first1 is different from last1 and whether first2 is equal to last2.
> However, we know that we will loop at least min(last1 - first1, last2 -
> first2) times before actually needing to check whether first1 == last1 or first2
> == last2, so why not take advantage of that by comparing against an
> integer counter and skipping a few iterator comparisons? Moreover, the
> value min(last1 - first1, last2 - first2) is already known when
> half_inplace_merge is called, so we can directly pass them to the
> function? The modified function looked like this:
>
>
> template<typename Compare, typename InputIterator1,
> typename InputIterator2, typename OutputIterator,
> typename Size>
> auto half_inplace_merge(InputIterator1 first1, InputIterator1 last1,
> InputIterator2 first2, InputIterator2 last2,
> OutputIterator result, Size min_len,
> Compare comp)
> -> void
> {
> for (; min_len != 0 ; --min_len) {
> if (comp(*first2, *first1)) {
> *result = std::move(*first2);
> ++first2;
> } else {
> *result = std::move(*first1);
> ++first1;
> }
> ++result;
> }
>
> for (; first1 != last1 ; ++result) {
> if (first2 == last2) {
> std::move(first1, last1, result);
> return;
> }
>
> if (comp(*first2, *first1)) {
> *result = std::move(*first2);
> ++first2;
> } else {
> *result = std::move(*first1);
> ++first1;
> }
> }
> // first2 through last2 are already in the right spot.
> }
>
>
> However, at this point, the new "optimization" made the code dramatically
> slower than it was before when sorting a vector of one million integers
> with a mergesort built on the top of std::inplace_merge. It kind of
> baffled me until I finally noticed that the second loop had some
> information that the first didn't have: the second loop knows that first1
> != last1 and first2 != last2 while it runs. However, it is apparently
> possible to give that information to the first loop too. I defined the
> following macro:
>
>
> #define ASSUME(cond) do { if (!(cond)) __builtin_unreachable(); }
> while(0)
>
>
> Then I changed the first loop (the "optimization") as follows:
>
>
> for (; min_len != 0 ; --min_len) {
>
> ASSUME(first1 != last1);
> ASSUME(first2 != last2);
>
> if (comp(*first2, *first1)) {
> *result = std::move(*first2);
> ++first2;
> } else {
> *result = std::move(*first1);
> ++first1;
> }
> ++result;
> }
>
>
> This additional information solved the "dramatically slower" problem, and
> then running the benchmarks again showed that the mergesort based on the
> modified half_inplace_merge was generally slightly faster than the old
> one (sometimes the difference is lost in the noise, but there are also
> cases where the difference is neat albeit small). I guess that the ASSUME
> macro actually solved an aliasing problem which wasn't present in the
> second loop due to its loop conditions.
>
>
> Anyway, I benchmarked that code using MinGW GCC 6.2, so the results might
> be different with Clang. I wanted to tell that small optimization story,
> and felt that I needed to share it here since I borrowed the original code
> from libc++. Now, you can use it as you want if you feel that the small
> performance gain proves to be consistent and worth it (IIRC you have
> benchmarks for some standard algorithms).
>
>
> The code used for the benchmark is a modified version of the following:
> https://github.com/Morwenn/cpp-sort/blob/master/benchmarks/bench.cpp
>
> Here is a graph with the results I got; merge_sort-new uses the modified
> half_inplace_merge (the names on the left correspond to different data
> distributions used to test the sorting algorithms):
> https://i.imgur.com/hwzJ84U. <https://i.imgur.com/hwzJ84U.png>
>
>
> With plenty of love <3
>
>
> Morwenn
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20161030/be894415/attachment.html>
More information about the cfe-dev
mailing list