<html><head><style type='text/css'>p { margin: 0; }</style></head><body><div style='font-family: arial,helvetica,sans-serif; font-size: 10pt; color: #000000'><br><hr id="zwchr"><blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px; color: rgb(0, 0, 0); font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><b>From: </b>"Morwenn Ed via cfe-dev" <cfe-dev@lists.llvm.org><br><b>To: </b>cfe-dev@lists.llvm.org<br><b>Sent: </b>Thursday, October 27, 2016 2:12:39 PM<br><b>Subject: </b>[cfe-dev] Potential optimization for std::inplace_merge<br><br>
<style style="display: none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
<div id="divtagdefaultwrapper" style="font-size: 12pt; color: rgb(0, 0, 0); font-family: Calibri,Arial,Helvetica,sans-serif;">
<p>I have been working on a sorting library for a while, and it happens to contain reimplementations of many standard algorithms borrowed from libc++ (mostly because I needed to add a few more features to standard algorithms, akin to those in range-v3). I was
working on a top-down in-place mergesort implementation, and I believe that I have found a potential albeit small
<span style="font-family: Calibri,Arial,Helvetica,sans-serif;">optimization for</span><span style="font-family: "Courier New",monospace;"><span style="font-family: Calibri,Arial,Helvetica,sans-serif;">
</span>std::inplace_merge</span>, specifically in the helper function <span style="font-family: "Courier New",monospace;">
half_inplace_merge</span>. Here is the original function borrowed from libc++ (coding style has been changed, don't pay attention to that):</p>
<p><br>
</p>
<p></p>
<div><span style="font-family: "Courier New",monospace;"> template<typename Compare, typename InputIterator1,</span><br>
<span style="font-family: "Courier New",monospace;"> typename InputIterator2,
</span><span style="font-family: "Courier New",monospace;">typename OutputIterator></span><br>
<span style="font-family: "Courier New",monospace;"> auto half_inplace_merge(InputIterator1 first1, InputIterator1 last1,</span><br>
<span style="font-family: "Courier New",monospace;"> InputIterator2 first2, InputIterator2 last2,</span><br>
<span style="font-family: "Courier New",monospace;"> OutputIterator result, Compare comp)</span><br>
<span style="font-family: "Courier New",monospace;"> -> void</span><br>
<span style="font-family: "Courier New",monospace;"> {</span><br>
<span style="font-family: "Courier New",monospace;"> for (; first1 != last1 ; ++result) {</span><br>
<span style="font-family: "Courier New",monospace;"> if (first2 == last2) {</span><br>
<span style="font-family: "Courier New",monospace;"> std::move(first1, last1, result);</span><br>
<span style="font-family: "Courier New",monospace;"> return;</span><br>
<span style="font-family: "Courier New",monospace;"> }</span><br>
<br>
<span style="font-family: "Courier New",monospace;"> if (comp(*first2, *first1)) {</span><br>
<span style="font-family: "Courier New",monospace;"> *result = std::move(*first2);</span><br>
<span style="font-family: "Courier New",monospace;"> ++first2;</span><br>
<span style="font-family: "Courier New",monospace;"> } else {</span><br>
<span style="font-family: "Courier New",monospace;"> *result = std::move(*first1);</span><br>
<span style="font-family: "Courier New",monospace;"> ++first1;</span><br>
<span style="font-family: "Courier New",monospace;"> }</span><br>
<span style="font-family: "Courier New",monospace;"> }</span><br>
<span style="font-family: "Courier New",monospace;"> // first2 through last2 are already in the right spot.</span><br>
<span style="font-family: "Courier New",monospace;"> }</span></div>
<br>
<p><span style="font-family: Calibri,Arial,Helvetica,sans-serif;">I made a simple observation: at every loop iteration, we check whether
<span style="font-family: "Courier New",monospace;">first1</span> is different from
<span style="font-family: "Courier New",monospace;">last1</span> and whether <span style="font-family: "Courier New",monospace;">
first2</span> is equal to <span style="font-family: "Courier New",monospace;">last2</span>. However, we know that we will loop at least
<span style="font-family: "Courier New",monospace;">min(last1 - first1, last2 - first2)</span> times before actually needing to check whether
<span style="font-family: "Courier New",monospace;">first1 == last1</span> or <span style="font-family: "Courier New",monospace;">
first2 == last2</span>, so why not take advantage of that by comparing against an integer counter and skipping a few iterator comparisons? Moreover, the value
<span style="font-family: "Courier New",monospace;">min(last1 - first1, last2 - first2)</span> is already known when
<span style="font-family: "Courier New",monospace;">half_inplace_merge</span> is called, so we can directly pass them to the function? The modified function looked like this:</span></p>
<p><span style="font-family: Calibri,Arial,Helvetica,sans-serif;"><br>
</span></p>
<p><span style="font-family: Calibri,Arial,Helvetica,sans-serif;"></span></p>
<div><span style="font-family: "Courier New",monospace;"> template<typename Compare, typename InputIterator1,</span><br>
<span style="font-family: "Courier New",monospace;"> typename InputIterator2,
</span><span style="font-family: "Courier New",monospace;">typename OutputIterator,<br>
typename Size></span><br>
<span style="font-family: "Courier New",monospace;"> auto half_inplace_merge(InputIterator1 first1, InputIterator1 last1,</span><br>
<span style="font-family: "Courier New",monospace;"> InputIterator2 first2, InputIterator2 last2,</span><br>
<span style="font-family: "Courier New",monospace;"> OutputIterator result, Size min_len,<br>
Compare comp)</span><span style="font-family: "Courier New",monospace;"></span><br>
<span style="font-family: "Courier New",monospace;"> -> void</span><br>
<span style="font-family: "Courier New",monospace;"> {</span><br>
<span style="font-family: "Courier New",monospace;"> for (; min_len != 0 ; --min_len) {</span><span style="font-family: "Courier New",monospace;"></span><br>
<span style="font-family: "Courier New",monospace;"> if (comp(*first2, *first1)) {</span><br>
<span style="font-family: "Courier New",monospace;"> *result = std::move(*first2);</span><br>
<span style="font-family: "Courier New",monospace;"> ++first2;</span><br>
<span style="font-family: "Courier New",monospace;"> } else {</span><br>
<span style="font-family: "Courier New",monospace;"> *result = std::move(*first1);</span><br>
<span style="font-family: "Courier New",monospace;"> ++first1;</span><br>
<span style="font-family: "Courier New",monospace;"> }</span><br>
<span style="font-family: "Courier New",monospace;"> ++result;</span><br>
<span style="font-family: "Courier New",monospace;"> }</span><br>
<br>
<span style="font-family: "Courier New",monospace;"> for (; first1 != last1 ; ++result) {</span><br>
<span style="font-family: "Courier New",monospace;"> if (first2 == last2) {</span><br>
<span style="font-family: "Courier New",monospace;"> std::move(first1, last1, result);</span><br>
<span style="font-family: "Courier New",monospace;"> return;</span><br>
<span style="font-family: "Courier New",monospace;"> }</span><br>
<br>
<span style="font-family: "Courier New",monospace;"> if (comp(*first2, *first1)) {</span><br>
<span style="font-family: "Courier New",monospace;"> *result = std::move(*first2);</span><br>
<span style="font-family: "Courier New",monospace;"> ++first2;</span><br>
<span style="font-family: "Courier New",monospace;"> } else {</span><br>
<span style="font-family: "Courier New",monospace;"> *result = std::move(*first1);</span><br>
<span style="font-family: "Courier New",monospace;"> ++first1;</span><br>
<span style="font-family: "Courier New",monospace;"> }</span><br>
<span style="font-family: "Courier New",monospace;"> }</span><br>
<span style="font-family: "Courier New",monospace;"> // first2 through last2 are already in the right spot.</span><br>
<span style="font-family: "Courier New",monospace;"> }</span></div>
<p></p>
<p><br>
</p>
<p>However, at this point, the new "optimization" made the code dramatically slower than it was before when sorting a vector of one million integers with a mergesort built on the top of
<span style="font-family: "Courier New",monospace;">std::inplace_merge</span>. It kind of baffled me until I finally noticed that the second loop had some information that the first didn't have: the second loop knows that
<span style="font-family: "Courier New",monospace;">first1 != last1</span> and <span style="font-family: "Courier New",monospace;">
first2 != last2</span> while it runs. However, it is apparently possible to give that information to the first loop too. I defined the following macro:</p>
<p><br>
</p>
<p><span id="DWT3344" style="font-family: "Courier New",monospace;"> #define ASSUME(cond) do { if (!(cond)) __builtin_unreachable(); } while(0)</span></p></div></blockquote>FYI, for Clang, you'll need to use __builtin_assume(cond) (http://clang.llvm.org/docs/LanguageExtensions.html#builtin-functions).<br><br> -Hal<br><blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px; color: rgb(0, 0, 0); font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><div id="divtagdefaultwrapper" style="font-size: 12pt; color: rgb(0, 0, 0); font-family: Calibri,Arial,Helvetica,sans-serif;"><p><span style="font-family: "Courier New",monospace;"></span></p>
<p><span><br>
</span></p>
<p><span>Then I changed the first loop (the "optimization") as follows:</span></p>
<p><span><br>
</span></p>
<p><span></span><span style="font-family: Calibri,Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",monospace;"> fo<span style="font-family: "Courier New",monospace;">r (; min_len != 0 ; --min_len) {</span></span></span></p>
<p></p>
<div><span style="font-family: "Courier New",monospace;"> ASSUME(first1 != last1);</span><br>
<span style="font-family: "Courier New",monospace;"> ASSUME(first2 != last2);</span></div>
<span style="font-family: Calibri,Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",monospace;"></span><span style="font-family: "Courier New",monospace;"></span></span>
<p></p>
<p><span style="font-family: Calibri,Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",monospace;"> if (comp(*first2, *first1)) {</span><br>
<span style="font-family: "Courier New",monospace;"> *result = std::move(*first2);</span><br>
<span style="font-family: "Courier New",monospace;"> ++first2;</span><br>
<span style="font-family: "Courier New",monospace;"> } else {</span><br>
<span style="font-family: "Courier New",monospace;"><span style="font-family: "Courier New",monospace;"> *result
</span>= std::move(*first1);</span><br>
<span style="font-family: "Courier New",monospace;"> ++first1;</span><br>
<span style="font-family: "Courier New",monospace;"> }</span><br>
<span style="font-family: "Courier New",monospace;"> ++result;</span><br>
<span style="font-family: "Courier New",monospace;"> }</span></span></p>
<p><span style="font-family: Calibri,Arial,Helvetica,sans-serif;"><span style="font-family: "Courier New",monospace;"><br>
</span></span></p>
<p><span style="font-family: Calibri,Arial,Helvetica,sans-serif;"><span style="font-family: Calibri,Arial,Helvetica,sans-serif;">This additional information solved the "dramatically slower" problem, and then running the benchmarks again showed that the mergesort
</span></span>based on the modified <span style="font-family: "Courier New",monospace;">
half_inplace_merge</span> was generally slightly faster than the old one (sometimes the difference is lost in the noise, but there are also cases where the difference is neat albeit small). I guess that the
<span style="font-family: "Courier New",monospace;">ASSUME</span> macro actually solved an aliasing problem which wasn't present in the second loop due to its loop conditions.</p>
<p><br>
</p>
<p>Anyway, I benchmarked that code using MinGW GCC 6.2, so the results might be different with Clang. I wanted to tell that small optimization story, and felt that I needed to share it here since I borrowed the original code from libc++. Now, you can use it
as you want if you feel that the small performance gain proves to be consistent and worth it (IIRC you have benchmarks for some standard algorithms).<br>
</p>
<p><br>
</p>
<p>The code used for the benchmark is a modified version of the following: <a href="https://github.com/Morwenn/cpp-sort/blob/master/benchmarks/bench.cpp" class="OWAAutoLink" id="LPlnk994007" target="_blank">
https://github.com/Morwenn/cpp-sort/blob/master/benchmarks/bench.cpp</a></p>
<p>Here is a graph with the results I got; merge_sort-new uses the modified <span style="font-family: "Courier New",monospace;">
half_inplace_merge</span> (the names on the left correspond to different data distributions used to test the sorting algorithms):
<a href="https://i.imgur.com/hwzJ84U.png" class="OWAAutoLink" id="LPlnk68622" target="_blank">
https://i.imgur.com/hwzJ84U.</a></p>
<p><br>
</p>
<p>With plenty of love <3</p>
<p><br>
</p>
<p>Morwenn<br>
<br>
</p>
<p><img src="https://i.imgur.com/hwzJ84U.png"></p>
<p></p>
<p></p>
</div>
<br>_______________________________________________<br>cfe-dev mailing list<br>cfe-dev@lists.llvm.org<br>http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev<br></blockquote><br><br><br>-- <br><div><span name="x"></span>Hal Finkel<br>Lead, Compiler Technology and Programming Languages<br>Leadership Computing Facility<br>Argonne National Laboratory<span name="x"></span><br></div></div></body></html>