<html>

    <head>

      <base href="https://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - are overlapping memory accesses optimal?"

   href="https://llvm.org/bugs/show_bug.cgi?id=24678">24678</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>are overlapping memory accesses optimal?

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Common Code Generator Code

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>spatel+llvm@rotateright.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>I'm not sure if this is a performance bug, but I'm filing it for further review

based on the discussion in D12543:

<a href="http://reviews.llvm.org/D12543">http://reviews.llvm.org/D12543</a>

The code in SelectionDAG's FindOptimalMemOpLowering() can generate overlapping

accesses when unaligned memops are specified as fast, but it's not clear if

overlapping is good for performance on all targets.

Example:

$ cat copy13bytes.c 

#include <string.h>

void foo(char *a, char *b) {

    memcpy(a, b, 13);

}

$ clang copy13bytes.c -S -o - -O2

...

    movq    (%rsi), %rax

    movq    5(%rsi), %rcx

    movq    %rcx, 5(%rdi)

    movq    %rax, (%rdi)

$ gcc copy13bytes.c -S -o - -O2

...

    movq    (%rsi), %rax

    movq    %rax, (%rdi)

    movl    8(%rsi), %eax

    movl    %eax, 8(%rdi)

    movzbl    12(%rsi), %eax

    movb    %al, 12(%rdi)

Note that any load/store in either case may be misaligned (and in the clang

case, at least one pair of the ops are guaranteed to be misaligned), but LLVM

chooses overlapping ops to reduce the instruction count.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>