<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - [inline asm, AVX] Issue with the SSE register allocation in inline assembly"
href="https://bugs.llvm.org/show_bug.cgi?id=37402">37402</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>[inline asm, AVX] Issue with the SSE register allocation in inline assembly
</td>
</tr>
<tr>
<th>Product</th>
<td>clang
</td>
</tr>
<tr>
<th>Version</th>
<td>6.0
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>All
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>-New Bugs
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedclangbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>raphael_bost@alumni.brown.edu
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org
</td>
</tr></table>
<p>
<div>
<pre>Created <span class=""><a href="attachment.cgi?id=20280" name="attach_20280" title="Test code">attachment 20280</a> <a href="attachment.cgi?id=20280&action=edit" title="Test code">[details]</a></span>
Test code
It seems there is a bug when using inline assembly for x86.
Say, we want to implement the following pseudo-code where all
variables (except cond) are __m128
// va = *a;
// vb = *b;
// mask = (cond)?0x00:0xFF;
//
// vc = (va ^ vb) & mask;
// va = va ^ vc;
// vb = vb ^ vc;
//
// *a = va;
// *b = vb;
Branch free, memory access oblivious code is targeted: use of
inline assembly seems recommended for such cases.
In this bug report, we will focus on the three middle lines from
above. Let us consider two implementations, and the generated assembly,
with -mavx -O3.
// Implementation 1:
asm volatile("vpxor %0, %1, %2\n\t"
"vpand %2, %3, %2\n\t"
"vpxor %0, %2, %0\n\t"
"vpxor %1, %2, %1\n\t"
: "+x"(va), "+x"(vb), "+x"(vc) /* output */
: "x"(mask) /* input */
: /* clobbered register */
);
// Generated ASM 1
vpxor %xmm0, %xmm1, %xmm2
vpand %xmm2, %xmm2, %xmm2 <- Idempotent and useless instruction
vpxor %xmm0, %xmm2, %xmm0
vpxor %xmm1, %xmm2, %xmm1
// Implementation 2:
asm volatile("vpxor %0, %1, %2\n\t"
"vpand %2, %3, %2\n\t"
"vpxor %0, %2, %0\n\t"
"vpxor %1, %2, %1\n\t"
: "+x"(va), "+x"(vb), "=x"(vc), "+x"(mask) /* output */
: /* input */
: /* clobbered register */
);
// Generated ASM 2
vpxor %xmm0, %xmm1, %xmm3
vpand %xmm3, %xmm2, %xmm3
vpxor %xmm0, %xmm3, %xmm0
vpxor %xmm1, %xmm3, %xmm1
Clearly, Implementation 1 does not work, and produces an invalid assembly.
I had the same generated assembly with clang 3.7, 5.0, 6.0 and trunk.
gcc and icc generate identical asm (cf. <a href="https://godbolt.org/g/gvkRph">https://godbolt.org/g/gvkRph</a> and the
attachements),
equivalent to ASM 2.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>