<html>
    <head>
      <base href="https://llvm.org/bugs/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - Code pessimization when using typedef-aligned pointers"
   href="https://llvm.org/bugs/show_bug.cgi?id=28343">28343</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Code pessimization when using typedef-aligned pointers
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>clang
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>3.8
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>LLVM Codegen
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedclangbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>matt@godbolt.org
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr></table>
      <p>
        <div>
        <pre>In clang 3.8, compiling the following two equivalent code snippets reveals an
apparent pessimization in the first case:

--

typedef double __attribute__((aligned(64))) aligned_double;
void maxArray1(aligned_double* __restrict x, aligned_double* __restrict y) {
    for (int i = 0; i < 65536; i++) x[i] = ((y[i] > x[i]) ? y[i] : x[i]);
}

void maxArray2(double* __restrict x, double* __restrict y) {
    x = static_cast<double*>(__builtin_assume_aligned(x, 64));
    y = static_cast<double*>(__builtin_assume_aligned(y, 64));
    for (int i = 0; i < 65536; i++) x[i] = ((y[i] > x[i]) ? y[i] : x[i]);
}

--

(see also <a href="https://godbolt.org/g/JmBXhP">https://godbolt.org/g/JmBXhP</a>)

It's my understanding the two code paths should be identical: indeed clang
3.7.1 and all version of GCC I checked treat the code identically in both
cases.

In clang 3.8, the maxArray1 case uses intermediate registers xmm2 and xmm3,
versus reading the values straight into registers:

        movupd  xmm0, xmmword ptr [rsi + 8*rax]
        movupd  xmm1, xmmword ptr [rsi + 8*rax + 16]
        movupd  xmm2, xmmword ptr [rdi + 8*rax]
        movupd  xmm3, xmmword ptr [rdi + 8*rax + 16]
        maxpd   xmm0, xmm2
        maxpd   xmm1, xmm3

vs

        movapd  xmm0, xmmword ptr [rsi + 8*rax]
        movapd  xmm1, xmmword ptr [rsi + 8*rax + 16]
        maxpd   xmm0, xmmword ptr [rdi + 8*rax]
        maxpd   xmm1, xmmword ptr [rdi + 8*rax + 16]</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>