[cfe-dev] Potential optimization for std::inplace_merge

Sun Oct 30 10:27:10 PDT 2016

Thanks for all of this hard work. It looks like a promising optimization.

Would you be interested in submitting a patch?

/Eric

On Thu, Oct 27, 2016 at 1:12 PM, Morwenn Ed via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> I have been working on a sorting library for a while, and it happens to
> contain reimplementations of many standard algorithms borrowed from libc++
> (mostly because I needed to add a few more features to standard algorithms,
> akin to those in range-v3). I was working on a top-down in-place mergesort
> implementation, and I believe that I have found a potential albeit small optimization
> for std::inplace_merge, specifically in the helper function
> half_inplace_merge. Here is the original function borrowed from libc++
> (coding style has been changed, don't pay attention to that):
>
>
>     template<typename Compare, typename InputIterator1,
>              typename InputIterator2, typename OutputIterator>
>     auto half_inplace_merge(InputIterator1 first1, InputIterator1 last1,
>                             InputIterator2 first2, InputIterator2 last2,
>                             OutputIterator result, Compare comp)
>         -> void
>     {
>         for (; first1 != last1 ; ++result) {
>             if (first2 == last2) {
>                 std::move(first1, last1, result);
>                 return;
>             }
>
>             if (comp(*first2, *first1)) {
>                 *result = std::move(*first2);
>                 ++first2;
>             } else {
>                 *result = std::move(*first1);
>                 ++first1;
>             }
>         }
>         // first2 through last2 are already in the right spot.
>     }
>
> I made a simple observation: at every loop iteration, we check whether
> first1 is different from last1 and whether first2 is equal to last2.
> However, we know that we will loop at least min(last1 - first1, last2 -
> first2) times before actually needing to check whether first1 == last1 or first2
> == last2, so why not take advantage of that by comparing against an
> integer counter and skipping a few iterator comparisons? Moreover, the
> value min(last1 - first1, last2 - first2) is already known when
> half_inplace_merge is called, so we can directly pass them to the
> function? The modified function looked like this:
>
>
>     template<typename Compare, typename InputIterator1,
>              typename InputIterator2, typename OutputIterator,
>              typename Size>
>     auto half_inplace_merge(InputIterator1 first1, InputIterator1 last1,
>                             InputIterator2 first2, InputIterator2 last2,
>                             OutputIterator result, Size min_len,
>                             Compare comp)
>         -> void
>     {
>         for (; min_len != 0 ; --min_len) {
>             if (comp(*first2, *first1)) {
>                 *result = std::move(*first2);
>                 ++first2;
>             } else {
>                 *result = std::move(*first1);
>                 ++first1;
>             }
>             ++result;
>         }
>
>         for (; first1 != last1 ; ++result) {
>             if (first2 == last2) {
>                 std::move(first1, last1, result);
>                 return;
>             }
>
>             if (comp(*first2, *first1)) {
>                 *result = std::move(*first2);
>                 ++first2;
>             } else {
>                 *result = std::move(*first1);
>                 ++first1;
>             }
>         }
>         // first2 through last2 are already in the right spot.
>     }
>
>
> However, at this point, the new "optimization" made the code dramatically
> slower than it was before when sorting a vector of one million integers
> with a mergesort built on the top of std::inplace_merge. It kind of
> baffled me until I finally noticed that the second loop had some
> information that the first didn't have: the second loop knows that first1
> != last1 and first2 != last2 while it runs. However, it is apparently
> possible to give that information to the first loop too. I defined the
> following macro:
>
>
>     #define ASSUME(cond) do { if (!(cond)) __builtin_unreachable(); }
> while(0)
>
>
> Then I changed the first loop (the "optimization") as follows:
>
>
>     for (; min_len != 0 ; --min_len) {
>
>         ASSUME(first1 != last1);
>         ASSUME(first2 != last2);
>
>         if (comp(*first2, *first1)) {
>             *result = std::move(*first2);
>             ++first2;
>         } else {
>             *result = std::move(*first1);
>             ++first1;
>         }
>         ++result;
>     }
>
>
> This additional information solved the "dramatically slower" problem, and
> then running the benchmarks again showed that the mergesort based on the
> modified half_inplace_merge was generally slightly faster than the old
> one (sometimes the difference is lost in the noise, but there are also
> cases where the difference is neat albeit small). I guess that the ASSUME
> macro actually solved an aliasing problem which wasn't present in the
> second loop due to its loop conditions.
>
>
> Anyway, I benchmarked that code using MinGW GCC 6.2, so the results might
> be different with Clang. I wanted to tell that small optimization story,
> and felt that I needed to share it here since I borrowed the original code
> from libc++. Now, you can use it as you want if you feel that the small
> performance gain proves to be consistent and worth it (IIRC you have
> benchmarks for some standard algorithms).
>
>
> The code used for the benchmark is a modified version of the following:
> https://github.com/Morwenn/cpp-sort/blob/master/benchmarks/bench.cpp
>
> Here is a graph with the results I got; merge_sort-new uses the modified
> half_inplace_merge (the names on the left correspond to different data
> distributions used to test the sorting algorithms):
> https://i.imgur.com/hwzJ84U. <https://i.imgur.com/hwzJ84U.png>
>
>
> With plenty of love <3
>
>
> Morwenn
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20161030/be894415/attachment.html>