<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - Trick for inlining more small constant sized memcmp() and memcpy()"
   href="https://bugs.llvm.org/show_bug.cgi?id=36426">36426</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Trick for inlining more small constant sized memcmp() and memcpy()
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Backend: X86
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>dave@znu.io
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>One can use a pair of overlapping loads to avoid calling the C runtime in more
scenarios when the size is constant and awkwardly small. For example,
"memcmp(a, b, 15) == 0" would generate:

size_t offset = 15 - sizeof(int64_t);
bool same = 0 == ((*(int64_t*)a ^ *(int64_t*)b) |
                  (*(int64_t*)(a + offset) ^ *(int64_t*)(b + offset)));

This should scale up to a pair of vector loads too.

I haven't benchmarked this with memcpy() yet, but I'd expect that avoiding a
call to the runtime to be worth it there too and for the same reasons: fewer
register spills, no function call overhead, and no dynamic algorithm selection
once the constantness of the size parameter is lost.

NOTE: memcmp/memcpy doesn't guarantee that the pointers are aligned, therefore
don't assume that the first load is "aligned" and the second load is "not
aligned". The opposite could be true at run time.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>