<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
</head>
<body dir="ltr">
<div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Arial,Helvetica,sans-serif;">
<p>I have been working on a sorting library for a while, and it happens to contain reimplementations of many standard algorithms borrowed from libc++ (mostly because I needed to add a few more features to standard algorithms, akin to those in range-v3). I was
 working on a top-down in-place mergesort implementation, and I believe that I have found a potential albeit small
<span style="font-family: Calibri,Arial,Helvetica,sans-serif;">optimization for</span><span style="font-family: "Courier New",monospace;"><span style="font-family: Calibri,Arial,Helvetica,sans-serif;">
</span>std::inplace_merge</span>, specifically in the helper function <span style="font-family: "Courier New",monospace;">
half_inplace_merge</span>. Here is the original function borrowed from libc++ (coding style has been changed, don't pay attention to that):</p>
<p><br>
</p>
<p></p>
<div><span style="font-family: "Courier New",monospace;">    template<typename Compare, typename InputIterator1,</span><br>
<span style="font-family: "Courier New",monospace;">             typename InputIterator2,
</span><span style="font-family: "Courier New",monospace;">typename OutputIterator></span><br>
<span style="font-family: "Courier New",monospace;">    auto half_inplace_merge(InputIterator1 first1, InputIterator1 last1,</span><br>
<span style="font-family: "Courier New",monospace;">                            InputIterator2 first2, InputIterator2 last2,</span><br>
<span style="font-family: "Courier New",monospace;">                            OutputIterator result, Compare comp)</span><br>
<span style="font-family: "Courier New",monospace;">        -> void</span><br>
<span style="font-family: "Courier New",monospace;">    {</span><br>
<span style="font-family: "Courier New",monospace;">        for (; first1 != last1 ; ++result) {</span><br>
<span style="font-family: "Courier New",monospace;">            if (first2 == last2) {</span><br>
<span style="font-family: "Courier New",monospace;">                std::move(first1, last1, result);</span><br>
<span style="font-family: "Courier New",monospace;">                return;</span><br>
<span style="font-family: "Courier New",monospace;">            }</span><br>
<br>
<span style="font-family: "Courier New",monospace;">            if (comp(*first2, *first1)) {</span><br>
<span style="font-family: "Courier New",monospace;">                *result = std::move(*first2);</span><br>
<span style="font-family: "Courier New",monospace;">                ++first2;</span><br>
<span style="font-family: "Courier New",monospace;">            } else {</span><br>
<span style="font-family: "Courier New",monospace;">                *result = std::move(*first1);</span><br>
<span style="font-family: "Courier New",monospace;">                ++first1;</span><br>
<span style="font-family: "Courier New",monospace;">            }</span><br>
<span style="font-family: "Courier New",monospace;">        }</span><br>
<span style="font-family: "Courier New",monospace;">        // first2 through last2 are already in the right spot.</span><br>
<span style="font-family: "Courier New",monospace;">    }</span></div>
<br>
<p><span style="font-family: Calibri,Arial,Helvetica,sans-serif;">I made a simple observation: at every loop iteration, we check whether
<span style="font-family: "Courier New",monospace;">first1</span> is different from
<span style="font-family: "Courier New",monospace;">last1</span> and whether <span style="font-family: "Courier New",monospace;">
first2</span> is equal to <span style="font-family: "Courier New",monospace;">last2</span>. However, we know that we will loop at least
<span style="font-family: "Courier New",monospace;">min(last1 - first1, last2 - first2)</span> times before actually needing to check whether
<span style="font-family: "Courier New",monospace;">first1 == last1</span> or <span style="font-family: "Courier New",monospace;">
first2 == last2</span>, so why not take advantage of that by comparing against an integer counter and skipping a few iterator comparisons? Moreover, the value
<span style="font-family: "Courier New",monospace;">min(last1 - first1, last2 - first2)</span> is already known when
<span style="font-family: "Courier New",monospace;">half_inplace_merge</span> is called, so we can directly pass them to the function? The modified function looked like this:</span></p>
<p><span style="font-family: Calibri,Arial,Helvetica,sans-serif;"><br>
</span></p>
<p><span style="font-family: Calibri,Arial,Helvetica,sans-serif;"></span></p>
<div><span style="font-family: "Courier New",monospace;">    template<typename Compare, typename InputIterator1,</span><br>
<span style="font-family: "Courier New",monospace;">             typename InputIterator2,
</span><span style="font-family: "Courier New",monospace;">typename OutputIterator,<br>
             typename Size></span><br>
<span style="font-family: "Courier New",monospace;">    auto half_inplace_merge(InputIterator1 first1, InputIterator1 last1,</span><br>
<span style="font-family: "Courier New",monospace;">                            InputIterator2 first2, InputIterator2 last2,</span><br>
<span style="font-family: "Courier New",monospace;">                            OutputIterator result, Size min_len,<br>
                            Compare comp)</span><span style="font-family: "Courier New",monospace;"></span><br>
<span style="font-family: "Courier New",monospace;">        -> void</span><br>
<span style="font-family: "Courier New",monospace;">    {</span><br>
<span style="font-family: "Courier New",monospace;">        for (; min_len != 0 ; --min_len) {</span><span style="font-family: "Courier New",monospace;"></span><br>
<span style="font-family: "Courier New",monospace;">            if (comp(*first2, *first1)) {</span><br>
<span style="font-family: "Courier New",monospace;">                *result = std::move(*first2);</span><br>
<span style="font-family: "Courier New",monospace;">                ++first2;</span><br>
<span style="font-family: "Courier New",monospace;">            } else {</span><br>
<span style="font-family: "Courier New",monospace;">                *result = std::move(*first1);</span><br>
<span style="font-family: "Courier New",monospace;">                ++first1;</span><br>
<span style="font-family: "Courier New",monospace;">            }</span><br>
<span style="font-family: "Courier New",monospace;">            ++result;</span><br>
<span style="font-family: "Courier New",monospace;">        }</span><br>
<br>
<span style="font-family: "Courier New",monospace;">        for (; first1 != last1 ; ++result) {</span><br>
<span style="font-family: "Courier New",monospace;">            if (first2 == last2) {</span><br>
<span style="font-family: "Courier New",monospace;">                std::move(first1, last1, result);</span><br>
<span style="font-family: "Courier New",monospace;">                return;</span><br>
<span style="font-family: "Courier New",monospace;">            }</span><br>
<br>
<span style="font-family: "Courier New",monospace;">            if (comp(*first2, *first1)) {</span><br>
<span style="font-family: "Courier New",monospace;">                *result = std::move(*first2);</span><br>
<span style="font-family: "Courier New",monospace;">                ++first2;</span><br>
<span style="font-family: "Courier New",monospace;">            } else {</span><br>
<span style="font-family: "Courier New",monospace;">                *result = std::move(*first1);</span><br>
<span style="font-family: "Courier New",monospace;">                ++first1;</span><br>
<span style="font-family: "Courier New",monospace;">            }</span><br>
<span style="font-family: "Courier New",monospace;">        }</span><br>
<span style="font-family: "Courier New",monospace;">        // first2 through last2 are already in the right spot.</span><br>
<span style="font-family: "Courier New",monospace;">    }</span></div>
<p></p>
<p><br>
</p>
<p>However, at this point, the new "optimization" made the code dramatically slower than it was before when sorting a vector of one million integers with a mergesort built on the top of
<span style="font-family: "Courier New",monospace;">std::inplace_merge</span>. It kind of baffled me until I finally noticed that the second loop had some information that the first didn't have: the second loop knows that
<span style="font-family: "Courier New",monospace;">first1 != last1</span> and <span style="font-family: "Courier New",monospace;">
first2 != last2</span> while it runs. However, it is apparently possible to give that information to the first loop too. I defined the following macro:</p>
<p><br>
</p>
<p><span style="font-family: "Courier New",monospace;">    #define ASSUME(cond) do { if (!(cond)) __builtin_unreachable(); } while(0)</span></p>
<p><span><br>
</span></p>
<p><span>Then I changed the first loop (the "optimization") as follows:</span></p>
<p><span><br>
</span></p>
<p><span></span><span style="font-family: Calibri,Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",monospace;">    fo<span style="font-family: "Courier New",monospace;">r (; min_len != 0 ; --min_len) {</span></span></span></p>
<p></p>
<div><span style="font-family: "Courier New",monospace;">        ASSUME(first1 != last1);</span><br>
<span style="font-family: "Courier New",monospace;">        ASSUME(first2 != last2);</span></div>
<span style="font-family: Calibri,Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",monospace;"></span><span style="font-family: "Courier New",monospace;"></span></span>
<p></p>
<p><span style="font-family: Calibri,Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",monospace;">        if (comp(*first2, *first1)) {</span><br>
<span style="font-family: "Courier New",monospace;">            *result = std::move(*first2);</span><br>
<span style="font-family: "Courier New",monospace;">            ++first2;</span><br>
<span style="font-family: "Courier New",monospace;">        } else {</span><br>
<span style="font-family: "Courier New",monospace;"><span style="font-family: "Courier New",monospace;">            *result
</span>= std::move(*first1);</span><br>
<span style="font-family: "Courier New",monospace;">            ++first1;</span><br>
<span style="font-family: "Courier New",monospace;">        }</span><br>
<span style="font-family: "Courier New",monospace;">        ++result;</span><br>
<span style="font-family: "Courier New",monospace;">    }</span></span></p>
<p><span style="font-family: Calibri,Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",monospace;"><br>
</span></span></p>
<p><span style="font-family: Calibri,Arial,Helvetica,sans-serif;"><span style="font-family: Calibri,Arial,Helvetica,sans-serif;">This additional information solved the "dramatically slower" problem, and then running the benchmarks again showed that the mergesort
</span></span>based on the modified <span style="font-family: "Courier New",monospace;">
half_inplace_merge</span> was generally slightly faster than the old one (sometimes the difference is lost in the noise, but there are also cases where the difference is neat albeit small). I guess that the
<span style="font-family: "Courier New",monospace;">ASSUME</span> macro actually solved an aliasing problem which wasn't present in the second loop due to its loop conditions.</p>
<p><br>
</p>
<p>Anyway, I benchmarked that code using MinGW GCC 6.2, so the results might be different with Clang. I wanted to tell that small optimization story, and felt that I needed to share it here since I borrowed the original code from libc++. Now, you can use it
 as you want if you feel that the small performance gain proves to be consistent and worth it (IIRC you have benchmarks for some standard algorithms).<br>
</p>
<p><br>
</p>
<p>The code used for the benchmark is a modified version of the following: <a href="https://github.com/Morwenn/cpp-sort/blob/master/benchmarks/bench.cpp" class="OWAAutoLink" id="LPlnk994007" previewremoved="true">
https://github.com/Morwenn/cpp-sort/blob/master/benchmarks/bench.cpp</a></p>
<p>Here is a graph with the results I got; merge_sort-new uses the modified <span style="font-family: "Courier New",monospace;">
half_inplace_merge</span> (the names on the left correspond to different data distributions used to test the sorting algorithms):
<a href="https://i.imgur.com/hwzJ84U.png" class="OWAAutoLink" id="LPlnk68622" previewremoved="true">
https://i.imgur.com/hwzJ84U.</a></p>
<p><br>
</p>
<p>With plenty of love <3</p>
<p><br>
</p>
<p>Morwenn<br>
<br>
</p>
<p><img src="https://i.imgur.com/hwzJ84U.png"></p>
<p></p>
<p></p>
</div>
</body>
</html>