<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - Optimiser misses good optimisations for std::merge"
   href="https://bugs.llvm.org/show_bug.cgi?id=42334">42334</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Optimiser misses good optimisations for std::merge
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>clang
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>LLVM Codegen
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedclangbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>denis.yaroshevskij@gmail.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org, neeilans@live.com, richard-llvm@metafoo.co.uk
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Hi.

Current implementation of std::merge can be significantly optimised and Roman
Lebedev (@lebedev.ri) suggested that these changes should be done by the
optimiser.

1) Just check for one boundary on every loop iteration instead of two.

If you look at the code:
<a href="https://github.com/llvm-mirror/libcxx/blob/b82bfabdfd93f8616df8e6510021642e1fd764ed/include/algorithm#L4353">https://github.com/llvm-mirror/libcxx/blob/b82bfabdfd93f8616df8e6510021642e1fd764ed/include/algorithm#L4353</a>
You can see that there is only one iterator getting advanced on every
iteration.
However - the code checks for both boundaries.

If we stop doing that and only check for one boundary - there is up to 30%
speed up on some benchmarks.
Here is a quick-bench with a 1.2 win:
<a href="http://quick-bench.com/UlL4LXNkrDL_rp6QNAFpIqRtsG0">http://quick-bench.com/UlL4LXNkrDL_rp6QNAFpIqRtsG0</a>

Here is the same diff on godbolt: <a href="https://gcc.godbolt.org/z/KpIsXK">https://gcc.godbolt.org/z/KpIsXK</a>

2) Unrolling one iteration for the first range after the check for second is
very beneficial to performance.
Here is the comparison with version 2 on quick-bench:
<a href="http://quick-bench.com/JsNpf5VWYuBftR2vI4SxDqYJt8M">http://quick-bench.com/JsNpf5VWYuBftR2vI4SxDqYJt8M</a> - extra 1.2 times (1.5
compared to std::merge)

Here is a comparison on godbolt: <a href="https://gcc.godbolt.org/z/SP43AH">https://gcc.godbolt.org/z/SP43AH</a>

The reason for this speed up (as I understand it) is that we don't do a jump
when second than first case. + I suspect that the loops tap into a more natural
pattern, like do while for the second range.
There is practically no difference between 1 and 2 optimization in size -
because in second case two memmoves are collapsed into one, which is also nice.

I have very limited understanding of what optimiser is capable and not capable
of doing for me - please take a look if this is possible.

FYI: std::stable_sort should also benefit from this kind of optimisation.
std::stable_sort is a very important algorithm, for example, in Chromium it
ends up used in some expensive places.

Related links:
Pr to change std::merge (unlikely to be accepted since goto is not constexpr
and I can't rewrite 2 without goto): <a href="https://reviews.llvm.org/D63063">https://reviews.llvm.org/D63063</a>
Switch based state machines do not get optimised:
<a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - Optimizing switch to jumps"
   href="show_bug.cgi?id=42313">https://bugs.llvm.org/show_bug.cgi?id=42313</a></pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>