<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
</head>
<body dir="ltr">
<div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Arial,Helvetica,sans-serif;">
<p>Sure, I gues<span style="font-family: Calibri,Arial,Helvetica,sans-serif;">s I can use the
</span><span class="pl-en"><span style="font-family: Calibri,Arial,Helvetica,sans-serif;">_LIBCPP_</span><span style="font-family: Calibri,Arial,Helvetica,sans-serif;">UNREACHABLE macro from <cstdlib> even though it looks like it'll be yet another incude for
 the already pretty big <algorithm>.</span></span></p>
<p><span class="pl-en"><span style="font-family: Calibri,Arial,Helvetica,sans-serif;"><br>
</span></span></p>
<p><span class="pl-en"><span style="font-family: Calibri,Arial,Helvetica,sans-serif;">If I understand the process correctly, I only have to submit an SVN patch to the cfe-commits mailing list. Are there any other legalese I should be aware of besides knowing
 the project's license?</span></span></p>
<p><span class="pl-en"><span style="font-family: Calibri,Arial,Helvetica,sans-serif;"><br>
</span></span></p>
<p><span class="pl-en"><span style="font-family: Calibri,Arial,Helvetica,sans-serif;">Thanks in advance.</span></span></p>
<p><span class="pl-en"><span style="font-family: Calibri,Arial,Helvetica,sans-serif;"><br>
</span></span></p>
<p><span class="pl-en"><span style="font-family: Calibri,Arial,Helvetica,sans-serif;">Morwenn</span><em><span style="font-family: Calibri,Arial,Helvetica,sans-serif;"></span></em></span><br>
</p>
<br>
<br>
<div style="color: rgb(0, 0, 0);">
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" color="#000000" face="Calibri, sans-serif"><b>De :</b> Eric Fiselier <eric@efcs.ca><br>
<b>Envoyé :</b> dimanche 30 octobre 2016 18:27<br>
<b>À :</b> Morwenn Ed<br>
<b>Cc :</b> cfe-dev@lists.llvm.org<br>
<b>Objet :</b> Re: [cfe-dev] Potential optimization for std::inplace_merge</font>
<div> </div>
</div>
<div>
<div dir="ltr">Thanks for all of this hard work. It looks like a promising optimization.
<div><br>
</div>
<div>Would you be interested in submitting a patch?</div>
<div><br>
</div>
<div>/Eric<br>
<div><br>
</div>
<div><br>
<div class="gmail_extra">
<div class="gmail_quote">On Thu, Oct 27, 2016 at 1:12 PM, Morwenn Ed via cfe-dev <span dir="ltr">
<<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex; border-left:1px #ccc solid; padding-left:1ex">
<div dir="ltr">
<div id="m_2470965386196942470divtagdefaultwrapper" style="font-size:12pt; color:#000000; font-family:Calibri,Arial,Helvetica,sans-serif">
<p>I have been working on a sorting library for a while, and it happens to contain reimplementations of many standard algorithms borrowed from libc++ (mostly because I needed to add a few more features to standard algorithms, akin to those in range-v3). I was
 working on a top-down in-place mergesort implementation, and I believe that I have found a potential albeit small
<span style="font-family:Calibri,Arial,Helvetica,sans-serif">optimization for</span><span style="font-family:"Courier New",monospace"><span style="font-family:Calibri,Arial,Helvetica,sans-serif">
</span>std::inplace_merge</span>, specifically in the helper function <span style="font-family:"Courier New",monospace">
half_inplace_merge</span>. Here is the original function borrowed from libc++ (coding style has been changed, don't pay attention to that):</p>
<p><br>
</p>
<p></p>
<div><span style="font-family:"Courier New",monospace">    template<typename Compare, typename InputIterator1,</span><br>
<span style="font-family:"Courier New",monospace">             typename InputIterator2,
</span><span style="font-family:"Courier New",monospace">typename OutputIterator></span><br>
<span style="font-family:"Courier New",monospace">    auto half_inplace_merge(<wbr>InputIterator1 first1, InputIterator1 last1,</span><br>
<span style="font-family:"Courier New",monospace">                            InputIterator2 first2, InputIterator2 last2,</span><br>
<span style="font-family:"Courier New",monospace">                            OutputIterator result, Compare comp)</span><br>
<span style="font-family:"Courier New",monospace">        -> void</span><br>
<span style="font-family:"Courier New",monospace">    {</span><br>
<span style="font-family:"Courier New",monospace">        for (; first1 != last1 ; ++result) {</span><br>
<span style="font-family:"Courier New",monospace">            if (first2 == last2) {</span><br>
<span style="font-family:"Courier New",monospace">                std::move(first1, last1, result);</span><br>
<span style="font-family:"Courier New",monospace">                return;</span><br>
<span style="font-family:"Courier New",monospace">            }</span><br>
<br>
<span style="font-family:"Courier New",monospace">            if (comp(*first2, *first1)) {</span><br>
<span style="font-family:"Courier New",monospace">                *result = std::move(*first2);</span><br>
<span style="font-family:"Courier New",monospace">                ++first2;</span><br>
<span style="font-family:"Courier New",monospace">            } else {</span><br>
<span style="font-family:"Courier New",monospace">                *result = std::move(*first1);</span><br>
<span style="font-family:"Courier New",monospace">                ++first1;</span><br>
<span style="font-family:"Courier New",monospace">            }</span><br>
<span style="font-family:"Courier New",monospace">        }</span><br>
<span style="font-family:"Courier New",monospace">        // first2 through last2 are already in the right spot.</span><br>
<span style="font-family:"Courier New",monospace">    }</span></div>
<br>
<p><span style="font-family:Calibri,Arial,Helvetica,sans-serif">I made a simple observation: at every loop iteration, we check whether
<span style="font-family:"Courier New",monospace">first1</span> is different from
<span style="font-family:"Courier New",monospace">last1</span> and whether <span style="font-family:"Courier New",monospace">
first2</span> is equal to <span style="font-family:"Courier New",monospace">last2</span>. However, we know that we will loop at least
<span style="font-family:"Courier New",monospace">min(last1 - first1, last2 - first2)</span> times before actually needing to check whether
<span style="font-family:"Courier New",monospace">first1 == last1</span> or <span style="font-family:"Courier New",monospace">
first2 == last2</span>, so why not take advantage of that by comparing against an integer counter and skipping a few iterator comparisons? Moreover, the value
<span style="font-family:"Courier New",monospace">min(last1 - first1, last2 - first2)</span> is already known when
<span style="font-family:"Courier New",monospace">half_inplace_merge</span> is called, so we can directly pass them to the function? The modified function looked like this:</span></p>
<p><span style="font-family:Calibri,Arial,Helvetica,sans-serif"><br>
</span></p>
<p><span style="font-family:Calibri,Arial,Helvetica,sans-serif"></span></p>
<div><span style="font-family:"Courier New",monospace">    template<typename Compare, typename InputIterator1,</span><br>
<span style="font-family:"Courier New",monospace">             typename InputIterator2,
</span><span style="font-family:"Courier New",monospace">typename OutputIterator,<br>
             typename Size></span><br>
<span style="font-family:"Courier New",monospace">    auto half_inplace_merge(<wbr>InputIterator1 first1, InputIterator1 last1,</span><br>
<span style="font-family:"Courier New",monospace">                            InputIterator2 first2, InputIterator2 last2,</span><br>
<span style="font-family:"Courier New",monospace">                            OutputIterator result, Size min_len,<br>
                            Compare comp)</span><span style="font-family:"Courier New",monospace"></span><br>
<span style="font-family:"Courier New",monospace">        -> void</span><br>
<span style="font-family:"Courier New",monospace">    {</span><br>
<span style="font-family:"Courier New",monospace">        for (; min_len != 0 ; --min_len) {</span><span style="font-family:"Courier New",monospace"></span><br>
<span style="font-family:"Courier New",monospace">            if (comp(*first2, *first1)) {</span><br>
<span style="font-family:"Courier New",monospace">                *result = std::move(*first2);</span><br>
<span style="font-family:"Courier New",monospace">                ++first2;</span><br>
<span style="font-family:"Courier New",monospace">            } else {</span><br>
<span style="font-family:"Courier New",monospace">                *result = std::move(*first1);</span><br>
<span style="font-family:"Courier New",monospace">                ++first1;</span><br>
<span style="font-family:"Courier New",monospace">            }</span><br>
<span style="font-family:"Courier New",monospace">            ++result;</span><br>
<span style="font-family:"Courier New",monospace">        }</span><br>
<br>
<span style="font-family:"Courier New",monospace">        for (; first1 != last1 ; ++result) {</span><br>
<span style="font-family:"Courier New",monospace">            if (first2 == last2) {</span><br>
<span style="font-family:"Courier New",monospace">                std::move(first1, last1, result);</span><br>
<span style="font-family:"Courier New",monospace">                return;</span><br>
<span style="font-family:"Courier New",monospace">            }</span><br>
<br>
<span style="font-family:"Courier New",monospace">            if (comp(*first2, *first1)) {</span><br>
<span style="font-family:"Courier New",monospace">                *result = std::move(*first2);</span><br>
<span style="font-family:"Courier New",monospace">                ++first2;</span><br>
<span style="font-family:"Courier New",monospace">            } else {</span><br>
<span style="font-family:"Courier New",monospace">                *result = std::move(*first1);</span><br>
<span style="font-family:"Courier New",monospace">                ++first1;</span><br>
<span style="font-family:"Courier New",monospace">            }</span><br>
<span style="font-family:"Courier New",monospace">        }</span><br>
<span style="font-family:"Courier New",monospace">        // first2 through last2 are already in the right spot.</span><br>
<span style="font-family:"Courier New",monospace">    }</span></div>
<p></p>
<p><br>
</p>
<p>However, at this point, the new "optimization" made the code dramatically slower than it was before when sorting a vector of one million integers with a mergesort built on the top of
<span style="font-family:"Courier New",monospace">std::inplace_merge</span>. It kind of baffled me until I finally noticed that the second loop had some information that the first didn't have: the second loop knows that
<span style="font-family:"Courier New",monospace">first1 != last1</span> and <span style="font-family:"Courier New",monospace">
first2 != last2</span> while it runs. However, it is apparently possible to give that information to the first loop too. I defined the following macro:</p>
<p><br>
</p>
<p><span style="font-family:"Courier New",monospace">    #define ASSUME(cond) do { if (!(cond)) __builtin_unreachable(); } while(0)</span></p>
<p><span><br>
</span></p>
<p><span>Then I changed the first loop (the "optimization") as follows:</span></p>
<p><span><br>
</span></p>
<p><span></span><span style="font-family:Calibri,Arial,Helvetica,sans-serif"><span style="font-family:"Courier New",monospace">    fo<span style="font-family:"Courier New",monospace">r (; min_len != 0 ; --min_len) {</span></span></span></p>
<p></p>
<div><span style="font-family:"Courier New",monospace">        ASSUME(first1 != last1);</span><br>
<span style="font-family:"Courier New",monospace">        ASSUME(first2 != last2);</span></div>
<span style="font-family:Calibri,Arial,Helvetica,sans-serif"><span style="font-family:"Courier New",monospace"></span><span style="font-family:"Courier New",monospace"></span></span>
<p></p>
<p><span style="font-family:Calibri,Arial,Helvetica,sans-serif"><span style="font-family:"Courier New",monospace">        if (comp(*first2, *first1)) {</span><br>
<span style="font-family:"Courier New",monospace">            *result = std::move(*first2);</span><br>
<span style="font-family:"Courier New",monospace">            ++first2;</span><br>
<span style="font-family:"Courier New",monospace">        } else {</span><br>
<span style="font-family:"Courier New",monospace"><span style="font-family:"Courier New",monospace">            *result
</span>= std::move(*first1);</span><br>
<span style="font-family:"Courier New",monospace">            ++first1;</span><br>
<span style="font-family:"Courier New",monospace">        }</span><br>
<span style="font-family:"Courier New",monospace">        ++result;</span><br>
<span style="font-family:"Courier New",monospace">    }</span></span></p>
<p><span style="font-family:Calibri,Arial,Helvetica,sans-serif"><span style="font-family:"Courier New",monospace"><br>
</span></span></p>
<p><span style="font-family:Calibri,Arial,Helvetica,sans-serif"><span style="font-family:Calibri,Arial,Helvetica,sans-serif">This additional information solved the "dramatically slower" problem, and then running the benchmarks again showed that the mergesort
</span></span>based on the modified <span style="font-family:"Courier New",monospace">
half_inplace_merge</span> was generally slightly faster than the old one (sometimes the difference is lost in the noise, but there are also cases where the difference is neat albeit small). I guess that the
<span style="font-family:"Courier New",monospace">ASSUME</span> macro actually solved an aliasing problem which wasn't present in the second loop due to its loop conditions.</p>
<p><br>
</p>
<p>Anyway, I benchmarked that code using MinGW GCC 6.2, so the results might be different with Clang. I wanted to tell that small optimization story, and felt that I needed to share it here since I borrowed the original code from libc++. Now, you can use it
 as you want if you feel that the small performance gain proves to be consistent and worth it (IIRC you have benchmarks for some standard algorithms).<br>
</p>
<p><br>
</p>
<p>The code used for the benchmark is a modified version of the following: <a href="https://github.com/Morwenn/cpp-sort/blob/master/benchmarks/bench.cpp" class="m_2470965386196942470OWAAutoLink" id="m_2470965386196942470LPlnk994007" target="_blank">
https://github.com/Morwenn/<wbr>cpp-sort/blob/master/<wbr>benchmarks/bench.cpp</a></p>
<p>Here is a graph with the results I got; merge_sort-new uses the modified <span style="font-family:"Courier New",monospace">
half_inplace_merge</span> (the names on the left correspond to different data distributions used to test the sorting algorithms):
<a href="https://i.imgur.com/hwzJ84U.png" class="m_2470965386196942470OWAAutoLink" id="m_2470965386196942470LPlnk68622" target="_blank">
https://i.imgur.com/hwzJ84U.</a></p>
<p><br>
</p>
<p>With plenty of love <3</p>
<p><br>
</p>
<p>Morwenn<br>
<br>
</p>
<p><img src="https://i.imgur.com/hwzJ84U.png"></p>
<p></p>
<p></p>
</div>
</div>
<br>
______________________________<wbr>_________________<br>
cfe-dev mailing list<br>
<a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/cfe-dev</a><br>
<br>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>