<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - Poor vectorization with -march=skylake compared to -march=haswell"
href="https://bugs.llvm.org/show_bug.cgi?id=37819">37819</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>Poor vectorization with -march=skylake compared to -march=haswell
</td>
</tr>
<tr>
<th>Product</th>
<td>clang
</td>
</tr>
<tr>
<th>Version</th>
<td>6.0
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Linux
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>-New Bugs
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedclangbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>jed@59a2.org
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org
</td>
</tr></table>
<p>
<div>
<pre>Created <span class=""><a href="attachment.cgi?id=20432" name="attach_20432" title="Source exhibiting optimizer oddity.">attachment 20432</a> <a href="attachment.cgi?id=20432&action=edit" title="Source exhibiting optimizer oddity.">[details]</a></span>
Source exhibiting optimizer oddity.
The attached code optimizes well for Haswell and runs nearly optimally on both
Haswell and Skylake.
$ clang -Wall -O3 -march=haswell -ffast-math -c mm-clang.c
00000000000000e0 <mult+0xe0> vmovapd ymm9,ymm6
00000000000000e4 <mult+0xe4> vbroadcastsd ymm10,QWORD PTR [rdi+rbx*8-0x800]
00000000000000ee <mult+0xee> vmovupd ymm6,YMMWORD PTR [rax-0x20]
00000000000000f3 <mult+0xf3> vmovupd ymm11,YMMWORD PTR [rax]
00000000000000f7 <mult+0xf7> vfmadd231pd ymm1,ymm6,ymm10
00000000000000fc <mult+0xfc> vfmadd231pd ymm7,ymm11,ymm10
0000000000000101 <mult+0x101> vbroadcastsd ymm10,QWORD PTR [rdi+rbx*8-0x400]
000000000000010b <mult+0x10b> vfmadd231pd ymm8,ymm6,ymm10
0000000000000110 <mult+0x110> vfmadd231pd ymm5,ymm11,ymm10
0000000000000115 <mult+0x115> vbroadcastsd ymm10,QWORD PTR [rdi+rbx*8]
000000000000011b <mult+0x11b> vfmadd231pd ymm2,ymm6,ymm10
0000000000000120 <mult+0x120> vfmadd231pd ymm3,ymm11,ymm10
0000000000000125 <mult+0x125> vbroadcastsd ymm10,QWORD PTR [rdi+rbx*8+0x400]
000000000000012f <mult+0x12f> vfmadd213pd ymm6,ymm10,ymm9
0000000000000134 <mult+0x134> vfmadd231pd ymm4,ymm11,ymm10
0000000000000139 <mult+0x139> add rax,0x400
000000000000013f <mult+0x13f> add rbx,0x1
0000000000000143 <mult+0x143> jne 00000000000000e0 <mult+0xe0>
It is much worse when optimized for Skylake.
$ clang -Wall -O3 -march=skylake -ffast-math -c mm-clang.c
0000000000000caf <mult+0xcaf> vmovapd YMMWORD PTR [rsp],ymm2
0000000000000cb4 <mult+0xcb4> vmovapd ymm2,YMMWORD PTR [rsp+0x20]
0000000000000cba <mult+0xcba> vmovapd ymm3,YMMWORD PTR [rsp+0x400]
0000000000000cc3 <mult+0xcc3> vfmadd231pd ymm2,ymm3,ymm0
0000000000000cc8 <mult+0xcc8> vmovapd YMMWORD PTR [rsp+0x20],ymm2
0000000000000cce <mult+0xcce> vmovapd ymm2,YMMWORD PTR [rsp+0x40]
0000000000000cd4 <mult+0xcd4> vfmadd231pd ymm2,ymm7,ymm0
0000000000000cd9 <mult+0xcd9> vmovapd YMMWORD PTR [rsp+0x40],ymm2
0000000000000cdf <mult+0xcdf> vmovapd ymm2,YMMWORD PTR [rsp+0x60]
0000000000000ce5 <mult+0xce5> vfmadd231pd ymm2,ymm5,ymm0
0000000000000cea <mult+0xcea> vmovapd YMMWORD PTR [rsp+0x60],ymm2
0000000000000cf0 <mult+0xcf0> vmovapd ymm2,YMMWORD PTR [rsp+0x80]
0000000000000cf9 <mult+0xcf9> vfmadd231pd ymm2,ymm4,ymm0
0000000000000cfe <mult+0xcfe> vmovapd YMMWORD PTR [rsp+0x80],ymm2
0000000000000d07 <mult+0xd07> vmovapd ymm2,YMMWORD PTR [rsp+0xa0]
0000000000000d10 <mult+0xd10> vfmadd231pd ymm2,ymm15,ymm0
If we drop -ffast-math, FMA instructions are no longer used (for either
-march=haswell or -march=skylake).
0000000000000107 <mult+0x107> vbroadcastsd ymm9,QWORD PTR [rdi+rbx*8-0x400]
0000000000000111 <mult+0x111> vmulpd ymm12,ymm9,ymm10
0000000000000116 <mult+0x116> vaddpd ymm8,ymm8,ymm12
000000000000011b <mult+0x11b> vmulpd ymm9,ymm9,ymm11
0000000000000120 <mult+0x120> vaddpd ymm6,ymm6,ymm9
I don't think -ffast-math should be needed to use FMA instructions here. It
certainly isn't needed for this code with GCC or Intel compilers.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>