<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - x86-domain-reassignment pass causes dreadful compile time slowdown when generating code for skylake-avx512 architecture"
href="https://bugs.llvm.org/show_bug.cgi?id=42172">42172</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>x86-domain-reassignment pass causes dreadful compile time slowdown when generating code for skylake-avx512 architecture
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Linux
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Backend: X86
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>Kevin.Harris@unisys.com
</td>
</tr>
<tr>
<th>CC</th>
<td>craig.topper@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, spatel+llvm@rotateright.com
</td>
</tr></table>
<p>
<div>
<pre>Created <span class=""><a href="attachment.cgi?id=22078" name="attach_22078" title="the small case described above">attachment 22078</a> <a href="attachment.cgi?id=22078&action=edit" title="the small case described above">[details]</a></span>
the small case described above
At Unisys, we have an LLVM-based JIT to generate x86-64 code from instruction
sequences from one of our historical architectures. We recently started
testing on servers with the skylake-avx512 architecture. We encountered a
shocking compile time slowdown when compiling for this target architecture.
Using pass timings, we isolated the problems to the x86-domain-reassignment
pass, by disabling this pass and seeing a large compile-time speedup. From the
scatter plot of 31K+ examples, it clearly shows that an n-squared algorithm
must be the culprit, since the slowdown is strongly related to the size of the
IR. I provide three examples, a small one, a medium sized one, and a large
one:
i131323820_f000203011542_1.bc - 712 object code bytes
i141043225_f400004013147_1.bc - 40376 object code bytes
i140409520_f400004002276_1.bc - 129424 object code bytes
The opt+llc pipeline that I ran for these cases showed the following slowdowns
when the x86-domain-reassignment pass is used:
small case: 0.013 secs to 0.017 secs
middle case: 1.043 secs to 7.981 secs
large case: 4.140 secs to 69.689 secs
The opt+llc pipeline that I used for these comparisons looks like this:
$LLVMPATH/opt -O3 -enable-tbaa -mcpu=skylake-avx512 $INFILE | $LLVMPATH/llc -O3
-enable-tbaa -filetype=obj -o=out.o -mcpu=skylake-avx512 -
where $INFILE is one of the 3 files listed above. These commands generated the
slow times noted above. The fast times were obtained by adding the
-disable-x86-domain-reassignment option to the llc command above.
I measured the object size for each of the 31K+ bitcode files that we used for
this experiment, and saw no changes to the object code size with/without the
-disable-x86-domain-reassignment option, so I'm presuming that this extra
compile time gives us no benefit.
Please let me know if I can provide any additional assistance in resolving this
problem.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>