<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - _mm_undefined_si128 compiles to incorrect SSE code with -O1 or higher"
   href="https://bugs.llvm.org/show_bug.cgi?id=32176">32176</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>_mm_undefined_si128 compiles to incorrect SSE code with -O1 or higher
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>clang
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>4.0
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>LLVM Codegen
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedclangbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>myriachan@gmail.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>_mm_undefined_si128 (internally, __builtin_ia32_undef128) is designed to allow
writing x86 SSE code that uses the existing values of SSE registers without
regard to their current contents.  An example is the following code to generate
an SSE register with all "1" bits:

__m128i ReturnOneBits()
{
  __m128i dummy = _mm_undefined_si128();
  return _mm_cmpeq_epi32(dummy, dummy);
}

It should compile to something like this:

pcmpeqd %xmm0, %xmm0
retq

But instead, with -O1, -O2 or -O3, it compiles to this:

xorps %xmm0, %xmm0
retq

In other words, it returns all "0" bits instead of all "1" bits.  (With
optimizations disabled, the generated code reads uninitialized memory then does
pcmpeqd on the two values, 

The following function *does* compile correctly, and clang in fact sees that
zeroing a register beforehand is unnecessary:

__m128i ReturnOneBits()
{
  __m128i dummy = _mm_setzero_si128();
  return _mm_cmpeq_epi32(dummy, dummy);
  // -or-
  return _mm_set_epi32(-1, -1, -1, -1);
}

These compile to:

pcmpeqd %xmm0, %xmm0
retq

Because clang's optimizer realizes that it doesn't care about the previous
value of xmm0, it actually would be an acceptable solution if
__builtin_ia32_undef128 were removed from the compiler and _mm_undefined_si128
simply called _mm_setzero_si128.  (This is what Microsoft Visual C++ does, in
fact.)

I have not tried the other _mm*_undefined* functions.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>