<html>
<head>
<base href="https://llvm.org/bugs/" />
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW --- - bubble sort test performance is 2 times worse with -unroll-runtime-epilog"
href="https://llvm.org/bugs/show_bug.cgi?id=30939">30939</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>bubble sort test performance is 2 times worse with -unroll-runtime-epilog
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Linux
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Loop Optimizer
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>evstupac@gmail.com
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org
</td>
</tr>
<tr>
<th>Classification</th>
<td>Unclassified
</td>
</tr></table>
<p>
<div>
<pre>Created <span class=""><a href="attachment.cgi?id=17562" name="attach_17562" title="test compiled with epilog">attachment 17562</a> <a href="attachment.cgi?id=17562&action=edit" title="test compiled with epilog">[details]</a></span>
test compiled with epilog
The performance difference between prologue and epilogue unroll is unclear for
the case.
performance differs 2 times on X86 when
SingleSource/Benchmarks/Stanford/Bubblesort.c is compiled with
-O2 -march=core-avx2 -mllvm -unroll-runtime-epilog=true (bad case)
and
-O2 -march=core-avx2 -mllvm -unroll-runtime-epilog=false (good case)
Attached assemblies from current compiler:
bs_epil.s
bs_prol.s
and assembly from hottest loop:
bs_epil_loop.s
bs_prol_loop.s
The code looks very similar and with some assembly modifications I was able to
make hottest loops identical keeping the same performance gap (2 times).
Deeper analysis uncovered that hottest loop (99% of execution time) mostly
consist of unpredictable branches stalls:
while ( i<top ) {
if ( sortlist[i] > sortlist[i+1] ) {
j = sortlist[i];
sortlist[i] = sortlist[i+1];
sortlist[i+1] = j;
}
i=i+1;
}
sortlist is randomly filled array. That way comparison in the loop is
completely unpredictable. The distance between branches is very short.
This makes the test very sensitive to code shifts and memory accesses order (as
it influence on branch prediction in the loop).
See related discussions:
<a href="https://reviews.llvm.org/D18158">https://reviews.llvm.org/D18158</a>
<a href="https://reviews.llvm.org/D24593">https://reviews.llvm.org/D24593</a></pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>