<html>
<head>
<base href="https://llvm.org/bugs/" />
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW --- - Extra vector move after a poor choice of destination register for a commutative operation"
href="https://llvm.org/bugs/show_bug.cgi?id=26515">26515</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>Extra vector move after a poor choice of destination register for a commutative operation
</td>
</tr>
<tr>
<th>Product</th>
<td>new-bugs
</td>
</tr>
<tr>
<th>Version</th>
<td>3.7
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Linux
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Keywords</th>
<td>performance
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>new bugs
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>peter@cordes.ca
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org
</td>
</tr>
<tr>
<th>Classification</th>
<td>Unclassified
</td>
</tr></table>
<p>
<div>
<pre>Possible duplicate of <a class="bz_bug_link
bz_status_NEW "
title="NEW --- - Unneeded register-register move emitted for x86_64"
href="show_bug.cgi?id=15705">https://llvm.org/bugs/show_bug.cgi?id=15705</a>. It shows
the same thing in integer code with cmov.
float add_in_wrong_order(__m128 v) {
__m128 shuf = _mm_movehl_ps(v, v);
__m128 sums = _mm_add_ps(shuf, v); // Using _ss doesn't help
return _mm_cvtss_f32(sums);
}
with clang 3.7.1 -xc -O3 -Wall -fverbose-asm -march=haswell -ffast-math
-mno-avx
compiles to (godbolt: <a href="http://goo.gl/sV9tMR">http://goo.gl/sV9tMR</a>)
# (see the end of this post for a suggested optimal sequence)
movaps xmm1, xmm0
movhlps xmm1, xmm1 # with add_ss, this is shufpd x1,x1, 1
addps xmm1, xmm0
movaps xmm0, xmm1 # should have just used addps xmm0, xmm1
I've seen this failure to generate the result in the desired register before in
clang output, but didn't get around to reporting it. I forget if I've seen
this with integer registers, or just with non-AVX vectors.
---
Would it be possible for LLVM to see that _mm_cvtss_f32 is only taking the low
element, and propagate that back to the inputs to add_ps? Probably not, since
you still need to potentially fault on a signalling NaN in elements 1..3 if the
MXCSR has that enabled. Still, doesn't -ffast-math mean you *don't* have to
care what happens to NaNs?
--------
As long as you don't cause slowdowns from doing FP calcs on garbage data,
-ffast-math lets you do a lot. I think this sequence would be valid for that
source, with -ffast-math:
movhlps xmm1, xmm0 # upper half is garbage
addss xmm0, xmm1 # so don't use it
There's a false dependency on the old contents of xmm1. When inlined, you can
use any long-dead register, or better: one that had to be ready at some point
before v was ready.
gcc likes to xor-zero registers to avoid false dependencies (e.g. for popcnt if
it doesn't want to overwrite the src).</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>