<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - Case where loop idiom recognition causes 2x slowdown"
href="https://bugs.llvm.org/show_bug.cgi?id=45980">45980</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>Case where loop idiom recognition causes 2x slowdown
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Linux
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Loop Optimizer
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>andrew.b.adams@gmail.com
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org
</td>
</tr></table>
<p>
<div>
<pre>Created <span class=""><a href="attachment.cgi?id=23504" name="attach_23504" title="repro">attachment 23504</a> <a href="attachment.cgi?id=23504&action=edit" title="repro">[details]</a></span>
repro
Loops with a very short but dynamic trip count (either 1 or 2 in the example
below) are converted to memcpy calls at -O3, which are about 2x slower than
just running the loop.
void bad(__m256 *__restrict a, __m256 *__restrict b, bool condition) {
for (int j = 0; j < 1000; j++) {
int s = condition ? 1 : 2;
for (int i = 0; i < s; i++) {
a[j * 2 + i] = b[j * 2 + i];
}
}
}
void good(__m256 *__restrict a, __m256 *__restrict b, bool condition) {
for (int j = 0; j < 1000; j++) {
int s = condition ? 1 : 2;
for (int i = 0; i < s; i++) {
asm volatile(""); // To prevent loop idiom recognition
a[j * 2 + i] = b[j * 2 + i];
}
}
}
This is particularly bad with avx512 when there's math going on in the outer
loop, because the call to memcpy also requires a vzeroupper, which spills
everything to the stack.
As an aside, it would be nice if PipelineTuningOptions, which currently allows
for turning off vectorization, unrolling, and a variety of other loop
optimizations, also enabled turning off loop idiom recognition. It's not
appropriate for every front-end language and there doesn't seem to be a clean
way to turn it off from the API.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>