<html>
<head>
<base href="https://llvm.org/bugs/" />
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW --- - spec2000/188.ammp, spec2006/433.milc, 444.namd, 447.dealII, 453.povray compilation fails on LTO stage after commit r256394"
href="https://llvm.org/bugs/show_bug.cgi?id=25999">25999</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>spec2000/188.ammp, spec2006/433.milc, 444.namd, 447.dealII, 453.povray compilation fails on LTO stage after commit r256394
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Linux
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Keywords</th>
<td>miscompilation
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Backend: X86
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>sergey.k.okunev@gmail.com
</td>
</tr>
<tr>
<th>CC</th>
<td>david.l.kreitzer@intel.com, denis.briltz@intel.com, elena.demikhovsky@intel.com, llvm-bugs@lists.llvm.org, sergos.gnu@gmail.com, spatel+llvm@rotateright.com, zia.ansari@intel.com
</td>
</tr>
<tr>
<th>Classification</th>
<td>Unclassified
</td>
</tr></table>
<p>
<div>
<pre>Bisect analysis showed LLVM revision 256394 is responsible for the fails. The
comments to commit are the following.
commit 75759ab3e9255fe5f716e4a71ca1ee56901dedf8
Author: Sanjay Patel <<a href="mailto:spatel@rotateright.com">spatel@rotateright.com</a>>
Date: Thu Dec 24 21:17:56 2015 +0000
[InstCombine] transform more extract/insert pairs into shuffles (PR2109)
This is an extension of the shuffle combining from r203229:
<a href="http://reviews.llvm.org/rL203229">http://reviews.llvm.org/rL203229</a>
The idea is to widen a short input vector with undef elements so the
existing shuffle transform for extract/insert can kick in.
The motivation is to finally solve PR2109:
<a class="bz_bug_link
bz_status_RESOLVED bz_closed"
title="RESOLVED FIXED - Missed optimization for extract/insertelements equivalent to movhps"
href="show_bug.cgi?id=2109">https://llvm.org/bugs/show_bug.cgi?id=2109</a>
For that example, the IR becomes:
%1 = bitcast <2 x i32>* %P to <2 x float>*
%ld1 = load <2 x float>, <2 x float>* %1, align 8
%2 = shufflevector <2 x float> %ld1, <2 x float> undef, <4 x i32> <i32 0,
i32 1, i32 undef, i32 undef>
%i2 = shufflevector <4 x float> %A, <4 x float> %2, <4 x i32> <i32 0, i32
1, i32 4, i32 5>
ret <4 x float> %i2
And x86 SSE output improves from:
movq (%rdi), %xmm1 ## xmm1 = mem[0],zero
movdqa %xmm1, %xmm2
shufps $229, %xmm2, %xmm2 ## xmm2 = xmm2[1,1,2,3]
shufps $48, %xmm0, %xmm1 ## xmm1 = xmm1[0,0],xmm0[3,0]
shufps $132, %xmm1, %xmm0 ## xmm0 = xmm0[0,1],xmm1[0,2]
shufps $32, %xmm0, %xmm2 ## xmm2 = xmm2[0,0],xmm0[2,0]
shufps $36, %xmm2, %xmm0 ## xmm0 = xmm0[0,1],xmm2[2,0]
retq
To the almost optimal:
movhpd (%rdi), %xmm0
Note: There's a tension in the existing transform related to generating
arbitrary shufflevector masks. We avoid that in other places in InstCombine
because we're scared that codegen can't handle strange masks, but it looks
like we're ok with producing those here. I purposely chose weird
insert/extract
indexes for the regression tests to see the effect in these cases.
For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal
or
better for these examples.
Differential Revision: <a href="http://reviews.llvm.org/D15096">http://reviews.llvm.org/D15096</a>
git-svn-id: <a href="https://llvm.org/svn/llvm-project/llvm/trunk@256394">https://llvm.org/svn/llvm-project/llvm/trunk@256394</a>
91177308-0d34-0410-b5e6-96231b3b80d8
LLVM-clang options: -m64 -fuse-ld=gold -Ofast -funroll-loops -flto -static
-mfpmath=sse -march=core-avx2
During LTO phase spec benchmarks fail with the following error message (e.g.,
spec2006/444.namd).
runspec --config=lnx-x86_64-clang-default.cfg --rebuild -a build -e ref64 -T
base 444
…………………………………………
clang++ -m64 -m64 -fuse-ld=gold -Ofast -funroll-loops -flto -static
-mfpmath=sse -march=core-avx2 -DSPEC_CPU_LP64 Compute.o ComputeList.o
ComputeNonbondedUtil.o LJTable.o Molecule.o Patch.o PatchList.o ResultSet.o
SimParameters.o erf.o spec_namd.o -o namd
Instruction does not dominate all uses!
%782 = extractelement <2 x double> %721, i32 1
%779 = insertelement <4 x double> undef, double %782, i32 0
Instruction does not dominate all uses!
%1053 = extractelement <2 x double> %974, i32 1
%1050 = insertelement <4 x double> undef, double %1053, i32 0
Instruction does not dominate all uses!
%1332 = shufflevector <2 x double> %1263, <2 x double> undef, <4 x i32> <i32
0, i32 1, i32 undef, i32 undef>
%1330 = shufflevector <4 x double> %1329, <4 x double> %1332, <4 x i32> <i32
0, i32 5, i32 undef, i32 undef>
LLVM ERROR: Broken function found, compilation aborted!
clang-3.8: error: linker command failed with exit code 1 (use -v to see
invocation)
specmake: *** [namd] Error 1
Okunev Sergey,
Software Engineer
Intel Compiler Team</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>