<html>

    <head>

      <base href="https://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - Redundant movs and an extra spill while using all xmm registers"

   href="https://llvm.org/bugs/show_bug.cgi?id=26810">26810</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>Redundant movs and an extra spill while using all xmm registers

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Linux

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Register Allocator

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>egor.kochetov@intel.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Created <span class=""><a href="attachment.cgi?id=15974" name="attach_15974" title="The reproducer (1 source file, assemblies for clang and GCC, auxiliary Makefile)">attachment 15974</a> <a href="attachment.cgi?id=15974&action=edit" title="The reproducer (1 source file, assemblies for clang and GCC, auxiliary Makefile)">[details]</a></span>

The reproducer (1 source file, assemblies for clang and GCC, auxiliary

Makefile)

When compiling the loop below, clang produces the binary that runs ~25% slower

than that of GCC.

    constexpr unsigned nIterations = 76800000; //< an arbitrary large number

    __m128d v[8];

    __m128d m1[nIterations];

    __m128d m2[8];

    for (unsigned i = 0; i < nIterations; ++i) {

        const __m128d M = m1 [i];

        __m128d a;

        a = _mm_mul_pd(M, m2[0]);

        v[0] = _mm_add_pd(v[0], a);

        a = _mm_mul_pd(M, m2[1]);

        v[1] = _mm_add_pd(v[1], a);

        a = _mm_mul_pd(M, m2[2]);

        v[2] = _mm_add_pd(v[2], a);

        a = _mm_mul_pd(M, m2[3]);

        v[3] = _mm_add_pd(v[3], a);

        a = _mm_mul_pd(M, m2[4]);

        v[4] = _mm_add_pd(v[4], a);

        a = _mm_mul_pd(M, m2[5]);

        v[5] = _mm_add_pd(v[5], a);

        a = _mm_mul_pd(M, m2[6]);

        v[6] = _mm_add_pd(v[6], a);

        a = _mm_mul_pd(M, m2[7]);

        v[7] = _mm_add_pd(v[7], a);

    }

In particular, there are weird movs around calculating v[6]:

   0x08048c4e <+334>:    movapd %xmm5,%xmm6

   0x08048c52 <+338>:    movapd %xmm4,%xmm5

   0x08048c56 <+342>:    movapd %xmm3,%xmm4

   0x08048c5a <+346>:    movapd %xmm2,%xmm3

Also, throughout the loop clang uses 3 spills for storing values of v[1], v[6]

and v[7], while GCC uses only two spills (for v[0] and v[1]).

Attached is the source file, annotated assembly for clang and an assembly for

GCC for the function 'loop', and the (optional) Makefile.

The source needs to be compiled with '-m32' to limit the number of xmm

registers available. -O2 and -Ofast produce the same results (though the

register names are changing).</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>