<html>
<head>
<base href="https://llvm.org/bugs/" />
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW --- - Incorrect generated code with AVX"
href="https://llvm.org/bugs/show_bug.cgi?id=27908">27908</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>Incorrect generated code with AVX
</td>
</tr>
<tr>
<th>Product</th>
<td>new-bugs
</td>
</tr>
<tr>
<th>Version</th>
<td>3.8
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>All
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>new bugs
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>gael.guennebaud@gmail.com
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org
</td>
</tr>
<tr>
<th>Classification</th>
<td>Unclassified
</td>
</tr></table>
<p>
<div>
<pre>The following piece code:
#include <iostream>
// <a href="http://bitbucket.org/eigen/eigen/get/default.tar.gz">http://bitbucket.org/eigen/eigen/get/default.tar.gz</a>
#include <Eigen/Dense>
using namespace Eigen;
int main()
{
Projective3d t4;
Vector3d v3;
do {
v3 = Vector3d::Ones();
} while (v3.cwiseAbs().minCoeff()<1e-16);
t4.matrix().setIdentity();
t4.matrix().col(3).head<3>() = v3;
std::cout << t4.matrix() << "\n\n";
t4.translate(v3);
// t4.translationExt() += t4.linearExt() * v3;
}
compiled with clang 3.7 or 3.8 or 3.9 with "-mavx -O2" generates the following
output:
1 0 0 1
0 1 0 5.29981e-315
0 0 1 1
0 0 0 1
where the "5.29981e-315" number should be "1". t4.matrix() is essentially a
4x4 column matrix stored as s static array of 16 doubles. The incorrect asm
part responsible for filling the last 4 entries is as follows:
movq %rax, 120(%rsp)
movl $1072693248, %ecx ## imm = 0x3FF00000
vmovq %rcx, %xmm0
vpslldq $8, %xmm0, %xmm0 ## xmm0 =
zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6,7]
vpshufd $226, %xmm0, %xmm0 ## xmm0 = xmm0[2,0,2,3]
vmovdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [0,1072693248,0,1072693248]
vpunpcklqdq %xmm0, %xmm1, %xmm0 ## xmm0 = xmm1[0],xmm0[0]
vmovdqa %xmm0, 96(%rsp)
movq %rax, 112(%rsp)
where %rax contains a representation of 1.0, and 96(%rsp) references the first
element of the last column.
Replacing the last line, that is "t4.translate(v3);" by the body of the
translate method (last commented line), hides the issue, and in this case we
get a much cleaner asm:
movq %rax, 120(%rsp)
movq %rax, 96(%rsp)
vmovaps LCPI0_2(%rip), %xmm0 ## xmm0 =
[4607182418800017408,4607182418800017408]
vmovups %xmm0, 104(%rsp)
I guess that the weird shifting and shuffling we see in the broken part comes
from the do-while condition which is only partly optimized away.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>