<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - [inline asm, AVX] Issue with the SSE register allocation in inline assembly"

   href="https://bugs.llvm.org/show_bug.cgi?id=37402">37402</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>[inline asm, AVX] Issue with the SSE register allocation in inline assembly

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>clang

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>6.0

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>-New Bugs

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedclangbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>raphael_bost@alumni.brown.edu

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Created <span class=""><a href="attachment.cgi?id=20280" name="attach_20280" title="Test code">attachment 20280</a> <a href="attachment.cgi?id=20280&action=edit" title="Test code">[details]</a></span>

Test code

It seems there is a bug when using inline assembly for x86.

Say, we want to implement the following pseudo-code where all

variables (except cond) are __m128

// va = *a;

// vb = *b;

// mask = (cond)?0x00:0xFF;

//

// vc = (va ^ vb) & mask;

// va = va ^ vc;

// vb = vb ^ vc;

//

// *a = va;

// *b = vb;

Branch free, memory access oblivious code is targeted: use of

inline assembly seems recommended for such cases.

In this bug report, we will focus on the three middle lines from

above. Let us consider two implementations, and the generated assembly, 

with -mavx -O3.

// Implementation 1: 

    asm volatile("vpxor %0, %1, %2\n\t"

                 "vpand %2, %3, %2\n\t"

                 "vpxor %0, %2, %0\n\t"

                 "vpxor %1, %2, %1\n\t"

                 : "+x"(va), "+x"(vb), "+x"(vc) /* output */

                 : "x"(mask) /* input */

                 : /* clobbered register */

    );

// Generated ASM 1

    vpxor %xmm0, %xmm1, %xmm2

    vpand %xmm2, %xmm2, %xmm2 <- Idempotent and useless instruction

    vpxor %xmm0, %xmm2, %xmm0

    vpxor %xmm1, %xmm2, %xmm1

// Implementation 2: 

    asm volatile("vpxor %0, %1, %2\n\t"

                 "vpand %2, %3, %2\n\t"

                 "vpxor %0, %2, %0\n\t"

                 "vpxor %1, %2, %1\n\t"

                 : "+x"(va), "+x"(vb), "=x"(vc), "+x"(mask) /* output */

                 : /* input */

                 : /* clobbered register */

    );

// Generated ASM 2

    vpxor %xmm0, %xmm1, %xmm3

    vpand %xmm3, %xmm2, %xmm3

    vpxor %xmm0, %xmm3, %xmm0

    vpxor %xmm1, %xmm3, %xmm1

Clearly, Implementation 1 does not work, and produces an invalid assembly.

I had the same generated assembly with clang 3.7, 5.0, 6.0 and trunk.

gcc and icc generate identical asm (cf. <a href="https://godbolt.org/g/gvkRph">https://godbolt.org/g/gvkRph</a> and the

attachements),

equivalent to ASM 2.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>