<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - Inline memcpy/memmove/memset with unknown size but bounded by derefenceable info"
   href="https://bugs.llvm.org/show_bug.cgi?id=43888">43888</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Inline memcpy/memmove/memset with unknown size but bounded by derefenceable info
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Scalar Optimizations
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>david.bolvansky@gmail.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define C 8
int d[C];
int s[C];

__attribute__((noinline))
void old(int *d, int *s, int n) {
    memcpy(d, s, n*sizeof(int));
}

__attribute__((noinline))
void new(int *d, int *s, int n) {

    switch(n) {
        case 8: memcpy(d, s, n*sizeof(int)); break;
        case 7: memcpy(d, s, n*sizeof(int)); break;
        case 6: memcpy(d, s, n*sizeof(int)); break;
        case 5: memcpy(d, s, n*sizeof(int)); break;
        case 4: memcpy(d, s, n*sizeof(int)); break;
        case 3: memcpy(d, s, n*sizeof(int)); break;
        case 2: memcpy(d, s, n*sizeof(int)); break;
        case 1: memcpy(d, s, n*sizeof(int)); break;
        case 0: break;
    }
}


int main(void) {
    for (int i = 0; i < 1000000000; ++i) {
        new(d, s, C);
    }
}



Idea:

Analyze memcpy's dst an src buffers - for example: infer that sizeof(dst) is 8
ints, sizeof(dst) is 8 ints. Replace small memcpy with jump table with inlined
memcpys (see "new" function).

Performance improvements:

For code above:

time ./old.out 

real    0m2,535s
user    0m2,534s
sys     0m0,000s

time ./new.out 

real    0m1,981s
user    0m1,981s
sys     0m0,000s


Or C = 8 ints and copy 6 ints.

time ./old.out 

real    0m3,123s
user    0m3,123s
sys     0m0,000s

time ./new.out 

real    0m1,979s
user    0m1,973s
sys     0m0,004s</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>