<div dir="ltr">Thanks for all of this hard work. It looks like a promising optimization.<div><br></div><div>Would you be interested in submitting a patch?</div><div><br></div><div>/Eric<br><div><br></div><div><br><div class="gmail_extra"><div class="gmail_quote">On Thu, Oct 27, 2016 at 1:12 PM, Morwenn Ed via cfe-dev <span dir="ltr"><<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">




<div dir="ltr">
<div id="m_2470965386196942470divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Arial,Helvetica,sans-serif">
<p>I have been working on a sorting library for a while, and it happens to contain reimplementations of many standard algorithms borrowed from libc++ (mostly because I needed to add a few more features to standard algorithms, akin to those in range-v3). I was
 working on a top-down in-place mergesort implementation, and I believe that I have found a potential albeit small
<span style="font-family:Calibri,Arial,Helvetica,sans-serif">optimization for</span><span style="font-family:"Courier New",monospace"><span style="font-family:Calibri,Arial,Helvetica,sans-serif">
</span>std::inplace_merge</span>, specifically in the helper function <span style="font-family:"Courier New",monospace">
half_inplace_merge</span>. Here is the original function borrowed from libc++ (coding style has been changed, don't pay attention to that):</p>
<p><br>
</p>
<p></p>
<div><span style="font-family:"Courier New",monospace">    template<typename Compare, typename InputIterator1,</span><br>
<span style="font-family:"Courier New",monospace">             typename InputIterator2,
</span><span style="font-family:"Courier New",monospace">typename OutputIterator></span><br>
<span style="font-family:"Courier New",monospace">    auto half_inplace_merge(<wbr>InputIterator1 first1, InputIterator1 last1,</span><br>
<span style="font-family:"Courier New",monospace">                            InputIterator2 first2, InputIterator2 last2,</span><br>
<span style="font-family:"Courier New",monospace">                            OutputIterator result, Compare comp)</span><br>
<span style="font-family:"Courier New",monospace">        -> void</span><br>
<span style="font-family:"Courier New",monospace">    {</span><br>
<span style="font-family:"Courier New",monospace">        for (; first1 != last1 ; ++result) {</span><br>
<span style="font-family:"Courier New",monospace">            if (first2 == last2) {</span><br>
<span style="font-family:"Courier New",monospace">                std::move(first1, last1, result);</span><br>
<span style="font-family:"Courier New",monospace">                return;</span><br>
<span style="font-family:"Courier New",monospace">            }</span><br>
<br>
<span style="font-family:"Courier New",monospace">            if (comp(*first2, *first1)) {</span><br>
<span style="font-family:"Courier New",monospace">                *result = std::move(*first2);</span><br>
<span style="font-family:"Courier New",monospace">                ++first2;</span><br>
<span style="font-family:"Courier New",monospace">            } else {</span><br>
<span style="font-family:"Courier New",monospace">                *result = std::move(*first1);</span><br>
<span style="font-family:"Courier New",monospace">                ++first1;</span><br>
<span style="font-family:"Courier New",monospace">            }</span><br>
<span style="font-family:"Courier New",monospace">        }</span><br>
<span style="font-family:"Courier New",monospace">        // first2 through last2 are already in the right spot.</span><br>
<span style="font-family:"Courier New",monospace">    }</span></div>
<br>
<p><span style="font-family:Calibri,Arial,Helvetica,sans-serif">I made a simple observation: at every loop iteration, we check whether
<span style="font-family:"Courier New",monospace">first1</span> is different from
<span style="font-family:"Courier New",monospace">last1</span> and whether <span style="font-family:"Courier New",monospace">
first2</span> is equal to <span style="font-family:"Courier New",monospace">last2</span>. However, we know that we will loop at least
<span style="font-family:"Courier New",monospace">min(last1 - first1, last2 - first2)</span> times before actually needing to check whether
<span style="font-family:"Courier New",monospace">first1 == last1</span> or <span style="font-family:"Courier New",monospace">
first2 == last2</span>, so why not take advantage of that by comparing against an integer counter and skipping a few iterator comparisons? Moreover, the value
<span style="font-family:"Courier New",monospace">min(last1 - first1, last2 - first2)</span> is already known when
<span style="font-family:"Courier New",monospace">half_inplace_merge</span> is called, so we can directly pass them to the function? The modified function looked like this:</span></p>
<p><span style="font-family:Calibri,Arial,Helvetica,sans-serif"><br>
</span></p>
<p><span style="font-family:Calibri,Arial,Helvetica,sans-serif"></span></p>
<div><span style="font-family:"Courier New",monospace">    template<typename Compare, typename InputIterator1,</span><br>
<span style="font-family:"Courier New",monospace">             typename InputIterator2,
</span><span style="font-family:"Courier New",monospace">typename OutputIterator,<br>
             typename Size></span><br>
<span style="font-family:"Courier New",monospace">    auto half_inplace_merge(<wbr>InputIterator1 first1, InputIterator1 last1,</span><br>
<span style="font-family:"Courier New",monospace">                            InputIterator2 first2, InputIterator2 last2,</span><br>
<span style="font-family:"Courier New",monospace">                            OutputIterator result, Size min_len,<br>
                            Compare comp)</span><span style="font-family:"Courier New",monospace"></span><br>
<span style="font-family:"Courier New",monospace">        -> void</span><br>
<span style="font-family:"Courier New",monospace">    {</span><br>
<span style="font-family:"Courier New",monospace">        for (; min_len != 0 ; --min_len) {</span><span style="font-family:"Courier New",monospace"></span><br>
<span style="font-family:"Courier New",monospace">            if (comp(*first2, *first1)) {</span><br>
<span style="font-family:"Courier New",monospace">                *result = std::move(*first2);</span><br>
<span style="font-family:"Courier New",monospace">                ++first2;</span><br>
<span style="font-family:"Courier New",monospace">            } else {</span><br>
<span style="font-family:"Courier New",monospace">                *result = std::move(*first1);</span><br>
<span style="font-family:"Courier New",monospace">                ++first1;</span><br>
<span style="font-family:"Courier New",monospace">            }</span><br>
<span style="font-family:"Courier New",monospace">            ++result;</span><br>
<span style="font-family:"Courier New",monospace">        }</span><br>
<br>
<span style="font-family:"Courier New",monospace">        for (; first1 != last1 ; ++result) {</span><br>
<span style="font-family:"Courier New",monospace">            if (first2 == last2) {</span><br>
<span style="font-family:"Courier New",monospace">                std::move(first1, last1, result);</span><br>
<span style="font-family:"Courier New",monospace">                return;</span><br>
<span style="font-family:"Courier New",monospace">            }</span><br>
<br>
<span style="font-family:"Courier New",monospace">            if (comp(*first2, *first1)) {</span><br>
<span style="font-family:"Courier New",monospace">                *result = std::move(*first2);</span><br>
<span style="font-family:"Courier New",monospace">                ++first2;</span><br>
<span style="font-family:"Courier New",monospace">            } else {</span><br>
<span style="font-family:"Courier New",monospace">                *result = std::move(*first1);</span><br>
<span style="font-family:"Courier New",monospace">                ++first1;</span><br>
<span style="font-family:"Courier New",monospace">            }</span><br>
<span style="font-family:"Courier New",monospace">        }</span><br>
<span style="font-family:"Courier New",monospace">        // first2 through last2 are already in the right spot.</span><br>
<span style="font-family:"Courier New",monospace">    }</span></div>
<p></p>
<p><br>
</p>
<p>However, at this point, the new "optimization" made the code dramatically slower than it was before when sorting a vector of one million integers with a mergesort built on the top of
<span style="font-family:"Courier New",monospace">std::inplace_merge</span>. It kind of baffled me until I finally noticed that the second loop had some information that the first didn't have: the second loop knows that
<span style="font-family:"Courier New",monospace">first1 != last1</span> and <span style="font-family:"Courier New",monospace">
first2 != last2</span> while it runs. However, it is apparently possible to give that information to the first loop too. I defined the following macro:</p>
<p><br>
</p>
<p><span style="font-family:"Courier New",monospace">    #define ASSUME(cond) do { if (!(cond)) __builtin_unreachable(); } while(0)</span></p>
<p><span><br>
</span></p>
<p><span>Then I changed the first loop (the "optimization") as follows:</span></p>
<p><span><br>
</span></p>
<p><span></span><span style="font-family:Calibri,Arial,Helvetica,sans-serif"><span style="font-family:"Courier New",monospace">    fo<span style="font-family:"Courier New",monospace">r (; min_len != 0 ; --min_len) {</span></span></span></p>
<p></p>
<div><span style="font-family:"Courier New",monospace">        ASSUME(first1 != last1);</span><br>
<span style="font-family:"Courier New",monospace">        ASSUME(first2 != last2);</span></div>
<span style="font-family:Calibri,Arial,Helvetica,sans-serif"><span style="font-family:"Courier New",monospace"></span><span style="font-family:"Courier New",monospace"></span></span>
<p></p>
<p><span style="font-family:Calibri,Arial,Helvetica,sans-serif"><span style="font-family:"Courier New",monospace">        if (comp(*first2, *first1)) {</span><br>
<span style="font-family:"Courier New",monospace">            *result = std::move(*first2);</span><br>
<span style="font-family:"Courier New",monospace">            ++first2;</span><br>
<span style="font-family:"Courier New",monospace">        } else {</span><br>
<span style="font-family:"Courier New",monospace"><span style="font-family:"Courier New",monospace">            *result
</span>= std::move(*first1);</span><br>
<span style="font-family:"Courier New",monospace">            ++first1;</span><br>
<span style="font-family:"Courier New",monospace">        }</span><br>
<span style="font-family:"Courier New",monospace">        ++result;</span><br>
<span style="font-family:"Courier New",monospace">    }</span></span></p>
<p><span style="font-family:Calibri,Arial,Helvetica,sans-serif"><span style="font-family:"Courier New",monospace"><br>
</span></span></p>
<p><span style="font-family:Calibri,Arial,Helvetica,sans-serif"><span style="font-family:Calibri,Arial,Helvetica,sans-serif">This additional information solved the "dramatically slower" problem, and then running the benchmarks again showed that the mergesort
</span></span>based on the modified <span style="font-family:"Courier New",monospace">
half_inplace_merge</span> was generally slightly faster than the old one (sometimes the difference is lost in the noise, but there are also cases where the difference is neat albeit small). I guess that the
<span style="font-family:"Courier New",monospace">ASSUME</span> macro actually solved an aliasing problem which wasn't present in the second loop due to its loop conditions.</p>
<p><br>
</p>
<p>Anyway, I benchmarked that code using MinGW GCC 6.2, so the results might be different with Clang. I wanted to tell that small optimization story, and felt that I needed to share it here since I borrowed the original code from libc++. Now, you can use it
 as you want if you feel that the small performance gain proves to be consistent and worth it (IIRC you have benchmarks for some standard algorithms).<br>
</p>
<p><br>
</p>
<p>The code used for the benchmark is a modified version of the following: <a href="https://github.com/Morwenn/cpp-sort/blob/master/benchmarks/bench.cpp" class="m_2470965386196942470OWAAutoLink" id="m_2470965386196942470LPlnk994007" target="_blank">
https://github.com/Morwenn/<wbr>cpp-sort/blob/master/<wbr>benchmarks/bench.cpp</a></p>
<p>Here is a graph with the results I got; merge_sort-new uses the modified <span style="font-family:"Courier New",monospace">
half_inplace_merge</span> (the names on the left correspond to different data distributions used to test the sorting algorithms):
<a href="https://i.imgur.com/hwzJ84U.png" class="m_2470965386196942470OWAAutoLink" id="m_2470965386196942470LPlnk68622" target="_blank">
https://i.imgur.com/hwzJ84U.</a></p>
<p><br>
</p>
<p>With plenty of love <3</p>
<p><br>
</p>
<p>Morwenn<br>
<br>
</p>
<p><img src="https://i.imgur.com/hwzJ84U.png"></p>
<p></p>
<p></p>
</div>
</div>

<br>______________________________<wbr>_________________<br>
cfe-dev mailing list<br>
<a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/cfe-dev</a><br>
<br></blockquote></div><br></div></div></div></div>